Teaching an LSTM how XOR works

Another response to a low-level request for research

In OpenAI’s “Requests for Research 2.0”, the OpenAI team lists a few interesting ideas. Here’s an example of one:

⭐ Train an LSTM to solve the XOR problem: that is, given a sequence of bits, determine its parity. The LSTM should consume the sequence, one bit at a time, and then output the correct answer at the sequence’s end. Test the two approaches below:

  • Generate a dataset of random 100,000 binary strings of length 50. Train the LSTM; what performance do you get?
  • Generate a dataset of random 100,000 binary strings, where the length of each string is independently and randomly chosen between 1 and 50. Train the LSTM. Does it succeed? What explains the difference?

This is usually With that in mind, I figured making the thought process behind this publically available would help others in trying to get to the more advanced requests.

In the grand scheme of OpenAI’s research requests, this one is in the category of “warmups”. LSTMs are easy to build right out of the box of many frameworks like Keras. So, I decided for a challenge, I would try doing this in Tensorflow.js.

First, let’s define our necessary components:

let nn;
let model;

let resolution = 20;
let cols;
let rows;

let xs;

Next, we define the training data and the test labels.

const train_xs = tf.tensor2d([
  [0, 0],
  [1, 0],
  [0, 1],
  [1, 1]
]);
const train_ys = tf.tensor2d([
  [0],
  [1],
  [1],
  [0]
]);

We will define our setup within one function. As you have probably noticed, the model definition closely resembles the API for Keras.

function setup() {
  createCanvas(400, 400);
  cols = width / resolution;
  rows = height / resolution;

  // Create the input data
  let inputs = [];
  for (let i = 0; i < cols; i++) {
    for (let j = 0; j < rows; j++) {
      let x1 = i / cols;
      let x2 = j / rows;
      inputs.push([x1, x2]);
    }
  }
  xs = tf.tensor2d(inputs);

  model = tf.sequential();
  let hidden = tf.layers.dense({
    inputShape: [2],
    units: 16,
    activation: 'sigmoid'
  });
  let output = tf.layers.dense({
    units: 1,
    activation: 'sigmoid'
  });
  model.add(hidden);
  model.add(output);

  const optimizer = tf.train.adam(0.2);
  model.compile({
    optimizer: optimizer,
    loss: 'meanSquaredError'
  })

  setTimeout(train, 10);

}

For training the model, we will simply output the training to the console. We will also train our model 1 layer at a time.

function train() {
  trainModel().then(result => {
    ///console.log(result.history.loss[0]);
    setTimeout(train, 10);
  });
}

function trainModel() {
  return model.fit(train_xs, train_ys, {
    shuffle: true,
    epochs: 1
  });
}

Now we can actually draw these results on the page.

function draw() {
  background(0);
  tf.tidy(() => {
    // Get the predictions
    let ys = model.predict(xs);
    let y_values = ys.dataSync();

    // Draw the results
    let index = 0;
    for (let i = 0; i < cols; i++) {
      for (let j = 0; j < rows; j++) {
        let br = y_values[index] * 255
        fill(br);
        rect(i * resolution, j * resolution, resolution, resolution);
        fill(255 - br);
        textSize(8);
        textAlign(CENTER, CENTER);
        text(nf(y_values[index], 1, 2), i * resolution + resolution / 2, j * resolution + resolution / 2)
        index++;
      }
    }
  });

}

As for the output itself, we will write all of the code above to a file called sketch.js, which we will call when we open our index.html file below:

<!DOCTYPE html>
<html>
  <head>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.8.0/p5.min.js"></script>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.8.0/addons/p5.dom.min.js"></script>
    <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@0.11.4"> </script>
    <script type="text/javascript" src="sketch.js"></script>
  </head>
  <body>
  </body>
</html>

Now, let’s recall the two questons from before:

  1. Generate a dataset of random 100,000 binary strings of length 50. Train the LSTM; what performance do you get?
  2. Generate a dataset of random 100,000 binary strings, where the length of each string is independently and randomly chosen between 1 and 50. Train the LSTM. Does it succeed? What explains the difference?

Subscribe to know whenever I post new content. I don't spam!


At least this isn't a full screen popup

That would be more annoying. Anyways, if you like what you're reading, consider subscribing to my newsletter! I'll notify you when I publish new posts - no spam.