NeurIPS 2019 Reflections
Highlights, Lowlights, and the Unexpected
Disclaimer: This post doesn’t reflect the view of any of the organizations I’m associated with. NeurIPS is huge with a lot to take in, so I might be some inaccuracies. I’m also posting this and writing this shortly after the end of the conference. Feedback is welcome!
This past week was big. Those in the ML field know that the Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019) was just held in Vancouver, Canada. For those that aren’t in the ML field, this a giant, international academic conference on machine learning (in fact, one of the two primary conferences in the space alongside ICML). It’s a little bit different than academic conferences in other fields, in large part because you’ve got large multinational companies releasing their research to the community alongside the usual academic researchers.
This is going to be a simple overview of what I thought were the main highlights (at least of the events that I attended), low points, and parts that I thought were probably important but that I’m still trying to finalize my opinion on. Here are some of my higher-level reflections:
People like Yoshua Benigo spent a lot of time trying to steer the field towards biologically-inspired ML frameworks. While others might see it as slightly terrifying to see experts of such stature acknowledging that we’re reaching the limits of our hardware capabilities, as an ex-biologist I found this refreshing.
Jumping off the previous point, seeing Aguera’s results from experiments in which simulated bacteria adapted to seek food and communicate through artificial evolution was an incredible experience.
Jeff Clune also had some awesome talks at the bio and artificial RL workshops (made even more awesome by the fact that they’re posted online). Spiking CNN + LSTM Neural networks made an appearance in a talk about getting them to solve Atari games.
Here were some other papers and projects I particularly liked in this track:
- Evaluating Protein Transfer Learning with TAPE
- Cormorant: Covariant Molecular Neural Networks
- N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules
The tone of this particular workshop generally felt much more relaxed and less hectic than he rest of the event. It was a welcome break. There is no postering or debate. It’s just people coming to show projects that they did simply because they looked cool. It was a great end to the week.
Here were some of my favorite papers from this part, though you can also find the full slide deck from this workshop here:
- Neural Painters: A learned differentiable constraint for generating brushstroke paintings
- Towards Sustainable Architecture: 3D Convolutional Neural Networks for Computational Fluid Dynamics Simulation and Reverse Design Workflow
- Towards Principled Evaluation of Creativity of Machine-Generated Art
With a lot of the projects and algorithms showcased at NeurIPS, there’s always the question of which ones will actually be used ten years, five years, or even one year from now. Retrospectives were a big part of this year’s conference, including some presentations with some of the greatest names I’ve ever seen in a conference presentation (yes, even compared to presentations on ML models with names such as variBAD):
While not expliticly focused on just interpretability, I was a fan of the Knowledge Representation and Reasoning Meets Machine Learning workshop on Friday. Here are some of my favorite papers from this area this year:
- Learning Interpretable Models Expressed in Linear Temporal Logic
- Logical Interpretations of Autoencoders
- TP-N2F: Tensor Product Representation for Natural To Formal Language Generation
I may be biased towards Bayesian Learning because it’s a space I worked on for so long (see my tutorial post as an example of why I might be biased). Nevertheless, it was great to see such emphasis on this approach, especially Emtiyaz Khan’s talk Deep Learning with Bayesian Principles. Basically, traditional deep learning uses trial-and-error to appoximate point-estimates for patterns in data. Bayesian ML by contrast involves starting with priors about the data, and updating them throughout the training process. The latter is also much better at working with aleoteric uncertainty (uncertainty regarding the accuracy of the input data, which regular ML models take a s agiven), and better at providing confidence intervals when models are applied in domains far outside what they were trained on. I recommend the following NeurIPS papers on the topic:
- Practical Deep Learning with Bayesian Principles
- A Simple Baseline for Bayesian Uncertainty in Deep Learning
The emphasis on Graph-based neural networks was also a welcome change. People in the drug-discovery and social networking space have been looking at Graph neural networks for some time, but now they’re getting their much-deserved time in the spotlight. Here are some NeurIPS papers on Graph-based deep learning that I thought were particularly fantastic:
- Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels
- Exact Combinatorial Optimization with Graph Convolutional Neural Networks
I saw so many startups putting to much effort and resources into making their presence known at NeurIPS. Many of them seem to be putting more effort into signalling through impressive-seeming research papers instead of signalling through, well, winning over actual customers (if you were at NeurIPS, you probably know exactly the startups I’m talking about).
I know the fact that all these companies were tricked into this almost-altruistic behavior by the conferences presenting themselves as signalling opportunities has probably been a good thing. However, I think we’re starting to see more of the negative externalities from all this.
There are definitely a lot of projects where it seems tricky to find a compelling ethical use-case (or worse, super easy to mis-use). For example, “Facial Reconstruction from Voice using Generative Adversarial Networks” in which the authors generate faces from voice. It’s as though many of these authors are still unaware of the fact that there are autocrats actively seeking tools like this to supress any threats to their power, regardless of the human cost.
The amount of effort being focused on reproducibility is still woefully inadequate. Papers like A Step Toward Quantifying Independently Reproducible Machine Learning Research by Edward Raff are a step in the right direction, but they’re not enough. Even outside the scope of using tools like Colab Notebooks (or Jupyter Notebooks with Docker images on Github), there are still issues such as how much the performance varies between hardware, or how different methods of setting seed values impact reproducibility. I’m aware that a lot of the focus on NeurIPS is supposed to be on the threoretical side, but the published experiments should still have some kind of robustness if they’re going to influence future theories in the space.
I’m increasingly of the opinion that flying so many people from far away to Vancouver for a few days is getting diminishing returns for the cost. And I’m not just saying that because the lines at the bathroom were ridiculous (I mean Goddamn, are people going to start preemptively wearing diapers to these things like die-hard concert-goers?). I think the usefulness of the conference venue as a way to discuss ideas is not as much as it might have been in previous years. For much of the conference, there was this feeling of sensory overload. It was also apparent just how hazardous it’s becoming to host poster sessions in the exact same time-slots as talk receptions.
Also, the SkyTrain shutdown? Wow, the timing couldn’t have been worse for that.
I’m glad that the conference has added talks on how machine learning can be applied in the space of addressing global warming. But, I am doubful that it will be enough, even for the low-bar of offsetting the emissions from flying thousands of people to this event and/or the emissions from the compute resources spent on large ML projects.
Let’s be honest, a lot of people are in the ML space (at least partially) because it’s a high-paying space to be in, high-enough paying that they might be able to feel that they can escape from the worst of global warming’s effects (which will almost certainly devour the world’s poorest). After all, the field is known for engineers that are probably putting in a lot more effort into building Aritificial General intelligence, than for preventing the possible negative effects of such a development.
Some of the talks in such workshops offered such marginal solutions that their intended purpose seemed to be just assuaging some people’s fears that maybe they can’t stave off the 6th big extinction event by tuning neural networks all day.
Like in previous years, it generally seems like when it comes to security applications, there is a lot more effort on the side of finding new exploits than there is on figuring out possible fixes. One can argue that this is the case in just about every area of Computer Science, but it seems unusually pronounced in the machine learning space.
Some people have used the anology of computer manufacturers realizing that the memory problems in their devices were enormous liabilities only when worms were practically shutting down half the internet in the 1990s. I would argue that this is an inadequate analogy, as the vulnerability situation is far worse with machine learning models.
As I mentioned previously, there were some more frequent criticisms of deep learning and it’s limitations this year. It seems some groups’ reponse was to go as far as they could in testing those limits. My favorite example of this was Richard Sutton giving an ambitious talk on a General AI-Agent Architecture (“SuperDyna”) at the Biological and artificial reinforcement learning workshop. The goal appears to be to combine many different RL frameworks together. The question of how we get from RL agents that can solve games like Go and starcraft, to agents with more general learning ability, has been unanswered for a long time. I applaud this attempt at
For everyday machine learning needs, there was a lot of new tools from new optimizers to new activation functions. Still trying to figure out which ones will be useful in the long run. Two of the Major ones are the mish activation function (and how well it could replace PReLU), and the SLS Optimizer (as a replacement for Adam). We’ll see how these fare in future retrospectives.
Aside from trends in general AI and research, this was also useful in testing my usual approaches for big research conferences. I was fortunate enough to have had previous experience with large conferences like ICML in the past, which made this one somewhat easier.
Reaching out to people before the conference was a fantastic decision. There are friends of mine that I definitely missed that were also there (if you’re reading this I’m sorry about that). However, being able to find friends and get their recommendations on which talks and workshops they were attending made navigating the event much easier. Making plans for workshops and talks well in advance is someting you should do anyways, but I highly recommend doing so in a group.
If you’re early in your career, joining a high-profile conference by signing up to be a volunteer is probably one of the most cost-efficient ways you can network. I didn’t to this at NeurIPS 2019 in particular, but I definitely did it at big conferences when I was still in Biotech. Given how much this strategy helped out back then, I’m sure it would yield massive returns for anyone else who tries it.
Norrowing down which of the 51 workshops to attend and which of the ~13K attendees to talk to is much easier if you have specific goals in mind. Some of the more helpful ones included:
- Finding collaborators or communities
- Finding new mentors (and or meeting your idols)
- Exploring unfamiliar or new field X.
- Talking with others in your specific subfield about approaches to common pitfalls
NeurIPS is great for networking, and there are some fantastic parties being thrown. However, if you do not set a hard limit for when you need to go to bed, you’ll probably regret it the next day. Even at times when the SkyTrain doesn’t suddenly become unreliable, you want to make a pre-planed escape plan for parties.
I also recommend scheduling at least one evening to take a break from all the parties happening at NeurIPS. You’ll quickly find that decent sleep is a scarce and precious resource (even outside the event), and that not every evening event is worth it.
If you’re worried you’re going to spend most of your time working during the conference, don’t go. Read the papers and watch the online presentations if they’re available. If you (or the company) are spending so much time and effor to go to this event, make sure you get the most out of it and don’t just treat it like any other “working vacation”.
I think one of my biggest regrets with my previous conference (ICML 2019) was having to stop poster sessions short to attend to a model I was training for work at the time (it wasn’t even all that important to the company either). I’m glad to have not had this problem this year, and I can tell you that it makes an enourmous difference in terms of how much information load you can handle.
I know there’s a lot of talk about Imposter Syndrome when it comes to research. There are plenty of advice columns out there regarding how to build confidence, and how imposter syndrome is irrational. My most useful advice on that topic? Accept that you’re probably an imposter. But don’t worry, because the overwhelming majority of people in this space likely are as well.