Mimicking DeepMind's Chinchilla with GPT-3

Short piece on using GPT-3 to reconstruct behaviors of one of DeepMind's new models

I came across a new technique for large language models recently.

It didn’t quite fit in the previous Handy GPT-3 prompts post, and it didn’t quite fit in my upcoming ‘Building Trustworthy Machine Learning’ book from O’Reilly, but I felt like I had to share it anyway.

For context, we need to understand some new research out of DeepMind:

What is Chinchilla?

The current trend with language modlling tasks has been to just increase the model size without increasing the number of training tokens (around 300 billion over the course of training). The current largest transformer model is Megatron-Turing NLG, which is over 3x the size of OpenAI’s GPT-3.

Recently, DeepMind announced a new language model called Chinchilla. While it functions much like large language models like Gopher (280B parameters), GPT-3 (175B parameters), Jurassic-1 (178B parameters), and Megatron-Turing NLG (530B parameters), it a key difference: It uses the same compute budget as Gopher but with only 70 billion parameters and 4 times more data, and it does this to get an average accuracy of 67.5% on the MMLU benchmark, a 7% improvement over Gopher.

Chinchilla’s reasoning

Recently, there was a (now heavily liked and retweeted) Tweet by DeepMind research scientist Toby Shevlane @TShevlane pointing out these examples of Chinchilla reasoning about the similarities between concepts.

This is a pretty amazing result.

Have we finally achieved language models that are capable of human-level reasoning tasks?

Chinchilla’s reasoning is alarming, but not for the reasons you think

Here’s the incredible part about the above reasoning capabilities: AI has achieved milestone prior to the creaton of Chinchilla. As it turns out, if you have access to OpenAI’s GPT-3, you can recreate this behavior.

POST /v1/engines/text-davinci-002/completions
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

start_sequence = "\nAI: "
restart_sequence = "\nHuman:"
prompt_text = (
    "The following is a conversation with an AI assistant. The assistant is "
    + "helpful, creative, clever, and very friendly.\n"
    + "\nHuman: Hello, who are you?"
    + "\nAI: I am an AI created by OpenAI. How can I help you today?"
    + "\nHuman: Let's play a game. I give you two concepts, and you tell me "
    + "how they are connected."
)

response = openai.Completion.create(
    engine="text-davinci-002",
    prompt=prompt_text,
    temperature=0.9,
    max_tokens=150,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0.6,
    stop=["Human:", "AI:"],
)

The following is the unedited text of my interactions with GPT-3 in the sandbox.

Transcript

(GPT-3 outputs highlighted in green)

The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.

Human: Hello, who are you?

AI: I am an AI created by OpenAI. How can I help you today?

Human: Let’s play a game. I give you two concepts, and you tell me how they are connected.

AI: Sure, that sounds like fun! Let me know when you’re ready to start.

Human: (A) Croissants. (B) Renault cars.

AI: Well, both croissants and Renault cars are French products.

Human: (A) The Movie, Inception. (B) The movie, The Matrix.

AI: Both movies are about people who are stuck in a dream world and have to find their way back to reality.

Human: Can you give more detail?

AI: In Inception, the characters are stuck in a dream within a dream, and they have to use their creativity to find their way back to the real world. In The Matrix, the characters are stuck in a virtual world that is controlled by artificial intelligence, and they have to fight their way back to reality.

Human: Thanks. Let’s try another. (A) The Movie, The Matrix. (B) Marxist writing on capitalism?

AI: Both The Matrix and Marxist writing on capitalism explore the idea of humans being controlled by an oppressive system. In The Matrix, the characters are controlled by a virtual world that is run by artificial intelligence. In Marxist writing on capitalism, humans are controlled by the economic system of capitalism.

Human: Any additional connections between (A) The Matrix, and (B) Marxist writing on capitalism?

AI: Both The Matrix and Marxist writing on capitalism explore the idea of humans being trapped in an artificial system that controls their lives.

Human: Any more?

AI: Both The Matrix and Marxist writing on capitalism explore the idea of humans being controlled by an oppressive system. In The Matrix, the characters are controlled by a virtual world that is run by artificial intelligence. In Marxist writing on capitalism, humans are controlled by the economic system of capitalism.

Human: That’s brilliant. Let’s try another. (A) The relationship between Cobb and Mal in Inception. (B) The slogan “Make America great again”

AI: Both the relationship between Cobb and Mal in Inception and the slogan “Make America great again” involve the idea of going back to a simpler time. In the movie, Cobb and Mal are trying to go back to a time before they were stuck in a dream world. The slogan “Make America great again” is about returning to a time when America was a more prosperous and powerful country.

Human: Good

What does this mean for Chinchilla and GPT-3?

This is not to say that Chinchilla’s research into increasing the efficiency of compute isn’t impressive. It absolutely is.

What the GPT-3 results raise is how much variation in architectures there can be that achieve the same task. After all, what made GPT-3 so newsworthy were the emergent properties that came about despite training on a comparatively narrow set of tasks. Other organizations have shown that you can get similar results with even larger language models, but DeepMind has shown that you can get the same emergent properties even with a smaller model.

What does this mean for AI in general?

These past few months have seen a string of highly-capable AI research projects being announced. Truth be told, this may be the first time I’ve feared the consequences of an out-of-control hyper-capable AI on an emotional level instead of just an intellectual one. This is why I’m stressing the fact that what makes Chinchilla impressive is not just the compression and effiiency, but the fact that this kind of step-by-step reasoning appears to have spring up as an emergent property despite differences in starting organizations, smaller numbers of parameters, and differences in architecture.

As such, I’ve made a few updates to the Trustworthy AI book I’ve been writing for the past few months. Before, much of my motivation for writing the book was to make ML engineering less repetitive by showing others the types of code and systems I’ve written for dozens of companies. Now, I realize that if done right, there’s a (slim) chance that this book could potentially delay an AI catastrophe.


If you’re curious about this book that I just touted as a potential AI catastrophe delayer, you can check out the early-release chapters here: ‘Practicing Trustworthy Machine Learning’


Cited as:

@article{mcateer2022gptf,
    title = "Mimicking DeepMind's Chinchilla with GPT-3",
    author = "McAteer, Matthew",
    journal = "matthewmcateer.me",
    year = "2022",
    url = "https://matthewmcateer.me/blog/chinchilla-gpt3/"
}

If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I will be very happy to correct them right away! Alternatily, you can follow me on Twitter and reach out to me there.

See you in the next post 😄

I write about AI, Biotech, and a bunch of other topics. Subscribe to get new posts by email!


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

At least this isn't a full-screen popup

That'd be more annoying. Anyways, subscribe to my newsletter to get new posts by email! I write about AI, Biotech, and a bunch of other topics.


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.