DeepMind today unveiled a new multimodal AI system capable of performing over 600 different tasks.
Dubbed Gato, this is arguably the most impressive all-in-one machine learning kit the world has ever seen.
According to a DeepMind blog post:
The agent, whom we call Gato, operates as a multi-modal, multi-tasking, multi-incarnation generalist police. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses or other tokens.
And while it remains to be seen exactly how well it will perform once researchers and users outside DeepMind labs get their hands on it, Gato appears to be everything GPT-3 wants it to be and more.
Here’s why it makes me sad: GPT-3 is a Large Language Model (LLM) produced by OpenAI, the world’s top-funded Artificial General Intelligence (AGI) company.
However, before we can compare GPT-3 and Gato, we need to understand where OpenAI and DeepMind came from as companies.
OpenAI is the brainchild of Elon Musk, it’s got billions of dollars in backing from Microsoft, and the US government could basically care less about what it does in terms of regulation and oversight.
Bearing in mind that the sole purpose of OpenAI is to develop and control an AGI (it is an AI capable of doing and learning everything a human could, with the same access), he is a Little scary that all the company has managed to produce is a really fancy LLM.
Don’t get me wrong, GPT-3 is awesome. In fact, it’s arguably as impressive as DeepMind’s Gato, but this assessment requires some nuance.
OpenAI took the LLM route on its way to AGI for one simple reason: nobody knows how to make AGI work.
Just as it took time from the discovery of fire to the invention of the internal combustion engine, figuring out how to move from deep learning to AGI won’t happen overnight.
GPT-3 is an example of an AI that can at least do something that feels human: it generates text.
What DeepMind did with Gato is, well, much the same. He took something that works much like an LLM and turned it into an illusionist capable of over 600 forms of conjuring.
As Mike Cook of the Knives and Paintbrushes research collective recently told TechCrunch’s Kyle Wiggers:
It sounds exciting that the AI is able to do all these tasks that look very different, because to us it looks like writing text is very different from controlling a robot.
But actually, it’s not too different from GPT-3 understanding the difference between regular English text and Python code.
That’s not to say it’s easy, but to the outside observer it might seem like the AI can also make a cup of tea or easily learn ten or fifty other tasks, and it can’t.
Basically, Gato and GPT-3 are both robust AI systems, but neither of them is capable of general intelligence.
Here is my problem: Unless your bet on AGI emerged as the result of some random act of luck – the movie Short Circuit comes to mind – it’s probably time for everyone to reevaluate their schedule on AGI.
I wouldn’t say “never” because it’s one of the only curse words in science. But this gives the impression that the AGI will not happen in our lifetime.
DeepMind has been working on AGI for over a decade and OpenAI since 2015. And neither has been able to solve the very first problem on AGI’s path to solving: building an AI able to learn new things without training.
I think Gato might be the most advanced multimodal AI system in the world. But I also think DeepMind took the same dead-end concept for AGI as OpenAI and just made it more marketable.
Final Thoughts: What DeepMind has done is remarkable and will likely make a lot of money for the company.
If I’m the CEO of Alphabet (the parent company of DeepMind), either I make Gato a pure product or I push DeepMind towards more development than research.
Gato could have the potential to operate more lucratively in the consumer market than Alexa, Siri, or Google Assistant (with the right marketing and applicable use cases).
But, Gato and GPT-3 are no more viable entry points for AGI than the virtual assistants mentioned above.
Gato’s ability to multitask is more like a video game console that can store 600 different games than a game you can play 600 different ways. It’s not a general AI, it’s a bunch of neatly grouped pre-trained narrow models.
That’s not a bad thing, if that’s what you’re looking for. But there’s simply nothing in the research paper accompanying Gato to indicate that this is even a nudge in the right direction for AGI, let alone a stepping stone.
At some point, the goodwill and capital that companies like DeepMind and OpenAI have generated through their deadpan insistence that AGI was right around the corner will have to show even the tiniest of dividends.