Find bugs in the herd with debuggable TensorFlow code

A PROWLER.io tutorial

back to our blogs

Find bugs in the herd with debuggable TensorFlow code

A PROWLER.io tutorial

Wei Yi

By Wei Yi (Senior Machine Learning Engineer)

TensorFlow is a powerful, open-source computation toolkit that’s widely used to implement machine learning models. For example, PROWLER.io researchers have developed GPflow to implement Gaussian Processes on top of TensorFlow. But writing correct TensorFlow code can be difficult. 

In this blog post, I’ll introduce three assertion writing techniques and a software engineering process to apply them. They make debugging TensorFlow code much easier and ensure that your code is correct with respect to algorithm definitions provided by research.

The problem:

  • Debugging TensorFlow code is hard. You can log and investigate tensor values via summaries or the TensorFlow debugger. But evaluated tensors are multi-dimensional numpy arrays, which makes investigation difficult. 
  • Tips on how to debug TensorFlow code can help, but they can’t ensure that your code is correct.
  • Graph visualisations from TensorBoard provide too much detail (see Figure 1), making them hard to use. Yet it’s crucial to ensure your graphs are structurally correct.
    Graph 1

    Figure 1. Visualization from TensorBoard

    To solve these problems, you need specifications to help you reason about correct code implementation. Specifications describe what the code is supposed to do, and implementations describe how to do it. A piece of code is correct only with respect to its specification. You may have specifications for your TensorFlow code in mind, but writing them down is tricky. 

    The solutions:

    These three techniques for writing specifications all use Python assertions that you can put directly in your code.

    Technique 1: tensor shape assertions

    When you introduce tensors, you need to write asserts to check their shape.  Often, incorrect assumptions about a tensors’ shape can be hidden by TensorFlow’s broadcasting mechanism when it runs the code.

    For example, in deep neural network reinforcement learning (e.g. DQNs), you usually have a prediction tensor, a target tensor and a loss tensor:

    prediction_tensor = q_function.output_tensor
    target_tensor = reward_tensor + discount * bootstrapped_tesnor
    loss_tensor = tf.losses.mean_squared_error(target_tensor, prediction_tensor)

    Listing 1. Typical tensors in DQN

    Here, prediction_tensor is aliased to the output of the Q-value network; target_tensor represents a desired value, bootstrapped from next states; and loss_tensor defines our training loss function. 

    The next listing shows the shape assertions for the introduced tensors. These assertions check that the shape of the prediction_tensor and the target_tensor must be the same in terms of batch_size and action_dimension, and since loss_tensor evaluates to a single number, its shape is []:

    prediction_tensor = q_function.output_tensor
    assert prediction_tensor.shape.to_list() == [batch_size, action_dimension]
    
    target_tensor = reward_tensor + discount * bootstrapped_tesnor
    assert target_tensor.shape.to_list() == [batch_size, action_dimension]
    
    loss_tensor = tf.losses.mean_squared_error(target_tensor, prediction_tensor)
    assert loss_tensor.shape.to_list() == []

    Listing 2. Tensor shape assertions

    You’ll get an assertion violation when running the code if the shape of a tensor does not match the expected shape defined in the corresponding assertion.

    Technique 2: tensor dependency

    After checking that the tensors have the right shape, you make sure the graph’s structure correctly reflects tensor dependencies in your mind. If the value of tensor B depends on the value of tensor A (e.g. B=A+1), then there should be an edge from node B to node A in the graph.

    The visualisation of an entire TensorFlow graph can have hundreds of nodes and edges, making it very difficult to understand. But you only need to visualise the relationships between the tensors you’ve introduced, and you can often group many tensors into a single node. In a multi-layer neural network that has many variables, for example, each variable is a tensor; but you can visualise the whole neural network as a single node.

    We’ve developed a new Python package – called TensorGroupDependency – that allows you to register only the tensors you want to visualise. It generates a new, much smaller visualisation showing only those tensors. You first call the add method to register the tensors into the TensorGroupDependency class, and then call the generate_dot_representation method to give you a visualisation of only those tensors and their dependencies. Here's how to use it:

    d = TensorGroupDependency()
    d.add(q_function, 'q_function')
    d.add(q_function.output_tensor, 'q_value_tensor')
    d.add(prediction_tensor, 'prediction_tensor')
    d.add(target_tensor, 'target_tensor')
    d.add(loss_tensor, 'loss_tensor')
    
    dot = d.generate_dot_representation()
    print(dot)
    
    d.generate_assertions(target_exp='d')

    Listing 3. Generate visualization with TensorGroupDependency

    And here's the visualisation:

    Graph 2

    Figure 2. Visualization from our TensorGroupDependency python package

    It includes the following information:

    • A node in the graph represents a tensor or set of tensors (such as all variables in a neural network).
    • There's a directed edge from node B to node A if at least one tensor in B depends on a tensor in A.
    • In each node, you see the kind and shape of the tensors.

    To check the correctness of the graph structure, you need to explain why every edge exists (i.e the dependency relationship between those tensors).  If you’re unable to explain the existence of certain edges, there’s a discrepancy between the idea in your mind and the actual code you’ve implemented. This usually signifies a bug.

    Once you’re done explaining the existence of each edge in the graph, you can call the generate_assertions method of the TensorGroupDependency class to generate all the assertions that describe the graph structure. The following listing of generated assertions becomes part of your code and will be checked in all future executions:

    d.assert_immediate_ancestors('q_function.variables', set())
    d.assert_immediate_ancestors('target_tensor', set())
    d.assert_immediate_ancestors('q_value_tensor', {'q_function.variables'})
    d.assert_immediate_ancestors('loss_tensor', {'target_tensor', 'prediction_tensor'})
    d.assert_immediate_ancestors('prediction_tensor', {'q_value_tensor'})

    Listing 4. Automatically generated tensor dependency assertions

    Technique 3: tensor equation evaluations

    So far, you’ve validated the dependency relationships between the tensors you defined. The final step is to validate that the dependencies perform correct numerical calculations. For example, the equations B = A +1 and B = A -1 both introduce a dependency from B to A,  so their dependency graph is the same. But you need to specify that B=A+1 is the correct implementation, not B=A-1. Do this using tensor equation evaluations:

    • For each equation in your algorithm, evaluate the tensors in each optimisation step by adding them in session.run in an optimisation step.
    • Write the same equation in numpy with those tensor evaluations and assert the expected value is the same as the corresponding paper definition.

    The next listing shows the tensor equation evaluation for the loss tensor. The sess.run evaluates the parameter_update_operations, as well as the tensors involved in calculating loss. Then you calculate mean_square_error using evaluations of the target_tensor and the prediction_tensor. Finally, you assert that the loss_tensor evaluation is equal to our calculated mean_square_error.

    _, prediction, target, loss = sess.run(
        [parameter_update_operations, prediction_tensor, target_tensor, loss_tensor], 
        feed_dict={})
    
    mean_square_error = np.mean(np.power(target - prediction, 2))
    np.testing.assert_almost_equal(loss, mean_square_error, decimal=1)

    Listing 5. Tensor equation evaluations

    The results:

    We have applied these techniques to all our TensorFlow-based learners. The following table reports the number of bugs we found while developing five machine learning modules:

    Table 1. Bugs detected with assertion techniques:


    Learning module

    Coding time(h)

    Verification time(h)

    Bugs detected by 

    (and time % spent)

    shape asserts

    (10% time)

    tensor graph 

    (40% time)

    tensor evals 

    (50% time)

    Monte Carlo critic

    5

    1

    3

    2

    1

    Bootstrapping critic

    7

    0.5

    4

    1

    2

    Generalised Advantage Estimation critic

    2

    0.5

    2

    0

    0

    TD(λ) critic

    3

    2

    5

    0

    1

    StarCraft2 learner

    7

    1

    1

    1

    0

    Total

    24

    5

    15

    4

    4

    The “Learning module” column lists the name of the code. These are critics in Actor-Critic algorithms. The StarCraft2 learner is an Actor-Critic learner from the paper StarCraft II: A New Challenge for Reinforcement Learning. These are all learning modules in deep reinforcement learning.

    The “Coding time” column reports the time in hours we spent writing the code of those learners; in total, we spent 24 hours.

    The “Verification time” column reports the time in hours spent applying our three techniques to verifying the correctness of the learners. We spent five hours writing the assertions, running the code, observing assertion violations and fixing detected problems. Note that an assertion violation doesn’t necessarily mean that the implementation is wrong. It only indicates a discrepancy between the implementation and the specification: either – or both – may be wrong. In this experiment, about half of the time, the specification is wrong and the implementation is actually correct. In any case, those discrepancies point towards a need for validation.

    The “Bugs detected by” columns show the number of bugs found and the percentage of verification time spent when applying each of the techniques. In total, we detected 23 bugs in only five hours. More importantly, after applying the techniques, we knew that we had validated the correctness of our code against the algorithm definitions.

    The three sub-columns break down the time we spent in applying each assertion technique and the number of bugs detected. These columns reveal that by investing a little time on simple shape assertions in the early stages, we already found a lot of bugs. Fixing those bugs early made locating other, more involved, bugs via other techniques easier.

    The reasons:

    Why are these assertion techniques effective for detecting bugs?

    First, they require you to define the correctness of your code by writing down the specification. Writing specifications is not a new idea, but our techniques make it more practical:

    • Shape assertions require you to write down the shapes of the tensors that you introduce - easy!
    • Tensor dependency only requires you to focus on the tensors you introduce, excluding all TensorFlow operations and naming spaces. This reduces the graph from hundreds of nodes to a dozen or so, making human investigation practical. And assertion generation reduces the time needed to write down assertions.
    • You validate every equation in numpy, which is easier than TensorFlow’s language.

    Second, finding where a bug originates is daunting in TensorFlow. When applied in order, our techniques help you localise faults. When you have a problem in the tensor dependency stage, you know that the shapes of all the involved tensors are correct. When you have a problem with a tensor equation, you know that the dependency structure is correct. In short, you can better focus on and locate every problem.

    Third, these techniques turn verifying TensorFlow code from a chore into a software engineering process that ensures the code is verified if you follow a simple to-do list:

    1. Write a shape assertion for all tensors you introduce.
    2. Explain all dependency edges between those tensors, and generate structural assertions automatically.
    3. Write an assertion to check every equation from the algorithm.

    A common problem when validating and/or testing code is knowing how to proceed and when to stop. Which part of the code do you start on? Which aspects should you check? How will you know when you’ve tested enough? Our three assertion techniques remove these doubts. You apply them one by one and each technique has finite and manageable steps that are bounded by the number of tensors and equations you introduced in your code — usually around a dozen. In the end, you know your code is validated against the research paper definition. Don’t overlook the power of such an engineering process; when people know the exact steps, they’re a lot more efficient.

    I need to make it clear that these techniques can only validate the code’s correctness with respect to the research paper definition, they can’t check if the code actually learns meaningful models. That’s covered by common machine learning practices such as plotting the loss over time, cross-validation, etc.

    These three techniques – tensor shape assertions, tensor dependency and tensor equation evaluations – are widely applicable to all kinds of TensorFlow code. The software engineering process to apply them is practical and efficient for both researchers and machine learning engineers. Without them, we used to spend weeks debugging a piece of TensorFlow code and yet still had no idea how correct the code was! Going forward, code equipped with specification will also enable us to develop advanced testing methods such as automated test case generation, further reducing the chore of manually verifying TensorFlow code.

    Help us build AI that will change the world

    join our team