In Google’s labs, a one-armed robot silently made a groundbreaking leap in robotic understanding and interaction. Presented with three plastic figurines – a lion, a whale, and a dinosaur – the robot was instructed to pick up the “extinct animal.” In an unprecedented maneuver, it accurately identified the dinosaur as the extinct animal and grabbed it. Such a feat, until recently, was unthinkable. Robots weren’t able to reliably manipulate objects they had never seen before, let alone make the logical leap from “extinct animal” to “plastic dinosaur.”
But now, robotics is undergoing a significant revolution powered by advances in large language models, the same kind of artificial intelligence (A.I.) systems that power popular chatbots like ChatGPT and Bard. In a pioneering move, Google has started integrating these cutting-edge language models into its robots, effectively giving them an “artificial brain.” The result? An impressive upgrade in their cognitive abilities, understanding, and problem-solving prowess.
Google’s newest robotics model, RT-2, showcased these capabilities in a private demonstration. Vincent Vanhoucke, Google DeepMind’s head of robotics, expressed that the development caused a fundamental rethinking of their research approach. Much of their previous work has been rendered obsolete by the progress achieved with this integration.
However, this doesn’t mean that robots have achieved human-like dexterity. They still fail at some basic tasks. But the ability to reason and improvise, brought by A.I. language models, marks a significant leap towards the future of robotics, says Ken Goldberg, a robotics professor at the University of California, Berkeley.
For many years, the approach to training robots to perform tasks was laborious and rigid. It involved programming the robots with explicit instructions for tasks, such as flipping a burger, then repeatedly tweaking these instructions until the task was perfected. This method, albeit functional for specific tasks, was time-consuming and demanded a vast amount of data from real-world tests. Consequently, teaching a robot a new task, like flipping a pancake instead of a burger, necessitated reprogramming it from scratch.
But what if robots could learn new skills by themselves? Inspired by this idea, Google researchers began connecting language models with robots two years ago. Their first project, PaLM-SayCan, could generate step-by-step instructions for various tasks but couldn’t translate them into actions. The latest model, RT-2, takes a giant leap forward. It’s a vision-language-action model, meaning it can see, analyze its surroundings, and instruct a robot on how to act.
RT-2 translates a robot’s movements into numbers, a process called tokenizing, and incorporates these tokens into the same training data as the language model. Eventually, RT-2 can predict how a robot’s arm should move to pick up a ball or toss a can into a recycling bin. As Karol Hausman, a Google research scientist, succinctly put it, “this model can learn to speak robot.”
Though not flawless, the RT-2 has exhibited the capacity to follow complex instructions, understand languages other than English, and make abstract connections between related concepts. While Google has no immediate plans to release the RT-2 more widely, the company sees a future where these language-equipped machines could be used in warehouses, medicine, and even as household assistants.
However, introducing A.I. language models as the “brains” of robots does come with potential risks, considering their tendency to make mistakes or concoct nonsensical answers. Despite this, Google insists that RT-2 is loaded with safety features to prevent accidents or harmful actions. It can, for example, be trained not to pick up containers with water in them to avoid damaging its hardware.
This idea of robots that can reason, plan, and improvise on the fly might seem daunting given Hollywood’s doomsday portrayals. Yet, at Google, it’s a reason for celebration. Hardware robots, after a period of stagnation, are making a comeback — all thanks to their chatbot brains.