Practical Lambda ResNets

How well does this replacement for attention mechanisms in vision work?

You can find the full github repo of one implementation (the one I’ll be using) here: Normally with a research paper like this, I’d go through the trouble of making my own implementation. However, I’m using a previous version for a few reasons:

  1. There are enough implementations/reviews now that a new one is increasingly redundant. This one has been making the rounds enough on Twitter almost to the point of breaking the anonyminity of the blind peer-review system.
  2. There’s still a lot of work to be done exploring the network’s performance. Beyond runtime and memory, there are plenty of techniques for explaining, visualizing, and interpreting attention networks that have not been applied to Lambda ResNets.
  3. Phil Wang included the actual lambda symbol in his implementation. That’s right. He used λ\lambda in the symbols making up the python code like an absolute champion! I’m not to proud to admit that I just cannot surpass that, no matter how many fancy code-organization or performance-optimization tools I bring to the table.

How do Lambda Networks work? (the short version)

Plenty of researchers have been trying to make attention networks work for vision, almost like they see it as the inevitable next step. The problem is that most attention mechanisms suffer from quadratic memory burdens. Here the authors describe an alternative formulation

This figure sums up what the authors are trying to say pretty nicely: Namely that these networks are supposed to be more memory efficient with the query and attention representations that an attention mechanism would use

Is it everything it’s cracked up to be?

Some of the language of this paper seems a little strange, almost like it was intended to sound impressive as possible while saying very little. There’s also plenty of experiments where the results are simply Out-Of-Memory errors, showing that we arent’ quite out of the woods with Attention’s memory problems yet. That being said, if these ImageNet performance results are reproducible, then this would be a pretty huge moment in computer vision.

What’s it like to actually use these?

We’ve finally got a Tensorflow version of this network, so we’ll go with that.

import tensorflow as tf
from lambda_networks.tfkeras import LambdaLayer

layer = LambdaLayer(
    dim_out = 32,
    r = 23,
    dim_k = 16,
    heads = 4,
    dim_u = 1

x = tf.random.normal((1, 64, 64, 16)) # channel last format
layer(x) # (1, 64, 64, 32)


Cited as:

    title = "Practical Lambda ResNets",
    author = "McAteer, Matthew",
    journal = "",
    year = "2020",
    url = ""

If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I will be very happy to correct them right away! Alternatily, you can follow me on Twitter and reach out to me there.

See you in the next post 😄

I write about AI, Biotech, and a bunch of other topics. Subscribe to get new posts by email!

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

At least this isn't a full-screen popup

That'd be more annoying. Anyways, subscribe to my newsletter to get new posts by email! I write about AI, Biotech, and a bunch of other topics.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.