Teaching Robots to Learn

It’s one thing to create computer software that can automatically find patterns in large datasets. It’s another thing entirely to create an artificial brain for a robot. Even the simplest kinds of robot learning and control in the real world are hugely challenging. Want your mechanical friend to walk down a flight of stairs? Most robots would tumble to their destruction. Want it to play a game of table tennis? Most robots would be more likely to knock the table over than return a ball. What we need to solve these dynamic, fast-moving, real-world learning problems is to combine the efforts of cognitive scientists, machine learning specialists and mathematicians. This is the dream of one computer scientist, who has dedicated his career to this aim.

We see a large robot arm hanging from the ceiling. It dangles over one end of a table tennis table, and in its robotic hand it holds a table tennis bat. A researcher holds its hand, guiding the bat like a parent teaching a child. She shows the robot how to return the ball from different angles. Unlike most robots, this one does not just passively allow itself to be moved by its human guide. This robot learns. It’s not long before the robot has learned a series of different strokes from its teacher, and it begins to play on its own, learning through trial and error which stroke to use at which time. It’s even clever enough to combine its repertoire of strokes in new ways to create its own returns. The finale is an actual game of table tennis between the human teacher and the robot. Perhaps neither is the world’s most accomplished player, but on this occasion the robot seems just as able as the human to return the Ping-Pong ball. It’s a spooky sight seeing a large disembodied arm with the dexterity and poise needed to spot the moving ball and flick the bat in just the right way to return it across the table every time.

This is the work of Jan Peters, leader of the Robot Learning Laboratory at the Max Planck Institute for Intelligent Systems and professor at the Technical University of Darmstadt (one of Germany’s best technical universities). Although playing table tennis may seem like an unlikely goal for robotics researchers, the complexity of the problems – recognition of objects, prediction of movement, fine control, fast reactions – are all similar to the problems faced by robots performing tasks outside the lab. Any robot clever enough to play table tennis should be clever enough to learn other skills that may be valuable in the real world.

The core problem of cognitive sciences, robotics and machine learning is that the researchers in these different disciplines do not speak the same language

Jan Peters was interested in robotics from an early age. “Already during high-school,” says Peters, “I discovered two passions: learning algorithms that make sense of complex data and robots that can sense, plan and act.” Peters combined both passions, by studying four different courses at four different universities. He studied informatics, computer science, mechanical and electrical engineering at Technical University of Munich, University of Hagen, National University of Singapore and the University of Southern California — receiving four Master’s degrees, one in each discipline. For his Ph.D., Jan Peters joined Stefan Schaal at the University of Southern California (USC). His Ph.D. thesis “Machine Learning of Motor Skills for Robotics” received the Dick Volz Best US PhD Runner-Up award.

Peters joined the Max Plank Institute after graduation and set up the Robot Learning Laboratory. There he worked with experts on machine learning such as Bernhard Schölkopf and neurorobotics such as Stefan Schaal.

The researchers were all fascinated by the way we learn. For skilled activities, they believed that one reason why an expert is so much better than a novice is because the expert has learned a repertoire of simpler behaviours that can be chosen and adapted very quickly. Apply this understanding to robots and you might be able to teach the robot a set of skills that it can then apply and adapt when the time is right.

Driving a car, searching for survivors in a disaster, or repairing equipment might all be examples of applications that such a skilled robot could handle, if only we could make a robot brain capable of learning. However real world applications such as these are dangerous; it is much safer to develop the learning methods using table tennis. The game is a particularly challenging task for robotics researchers, for robots do not have the speed of movement nor the sensors of their human opponents. Their large inertia and half-blind vision, plus an endless number of ways in which a bat can be used to hit a ball, make the task extremely difficult for computers to handle.

Jan Peters

Jan Peters and his team solved the problem by breaking it into smaller steps. They recognised there are different stages to hitting a ball: the Awaiting Stage in which the opponent is observed, the Preparation Stage where the player moves into position for the stroke, the Hitting Stage in which the ball is intercepted, and the Finishing Stage, or the follow-through of the bat. During each stage, the robot needs different movement strategies. To provide them, a human guides the robot and a computer learns the corresponding movements required in order to duplicate that movement, using a new method called Local Gaussian Process Regression, designed to work faster and more accurately than many existing machine learning methods. “Our robot can play bad table tennis from imitation,” says Peters. “A human table tennis teacher takes the robot by the hand and the robot learns: this a backhand, this is a forehand.”

Once the robot has learned a simple set of skills, it must then fine-tune them. “The robot self-improves at the speed of a human child by trial and error,” explains Peters, “receiving positive feedback for good actions and negative feedback for bad striking movements.” This was implemented by using reinforcement learning, which optimises behaviours to maximise a reward – in this case, hitting the ball successfully. The difficulty of the problem meant yet more innovation, this time in the form of the new Policy learning by Weighting Exploration with the Returns (PoWER) algorithm.

The team are still improving their techniques but already they have some impressive results. Their bulky robot arm can play table tennis better than any other robot so far; it can even learn to play the tricky Ball-in-a-Cup game. Its brain comprises a set of some of the fastest machine learning approaches for practical robotics to date.

The remarkable achievements of Peters were not without challenges. Coming from California after his doctorate, he was initially nearly a stranger to European research as he had been overseas for seven years. In his first week at the Max Plank Institute he was told about PASCAL – the network of machine learning specialists. He joined immediately. It became the launching pad of his career. “PASCAL has been extremely beneficial for me,” says Peters, “allowing me to connect to many important researchers from all major European countries as well as Israel.

“PASCAL was an exceptional opportunity as it allowed me to create a tight network of collaborators all over Europe within a short amount of time while allowing me attend various workshops, summer schools and other research meetings.”

The story of Jan Peters provides an inspirational glimpse of the success of PASCAL2 and how its members have benefited. Peters has made some important contributions to the field, recently leading a PASCAL2 thematic programme on Cognitive Architecture and Robotics, which comprised many workshops and summer schools. A key aim was to build bridges between cognitive science and machine learning, and to bring robots out of the research labs.

“The core problem of cognitive sciences, robotics and machine learning is that the researchers in these different disciplines do not speak the same language,” says Peters, “nor would they meet at the conferences which they regularly attend. For any collaboration, you have to bring researchers from these fields into the same room and translate between them. Of course, this requires the presence of “mediators” (aka organizers) who will make sure that this process works well. The core service of the Thematic Programme was to create a pool of such mediators.”

The program was very successful, with many workshops (such as the European Workshop on Reinforcement Learning) and summer schools (such as the Summer School on Machine Learning and Cognitive Sciences). “Many young researchers have entered the field because of the Thematic Programme on Cognitive Architecture and Robotics,” says Peters, “They profited tremendously from this platform.”

Jan Peters also has benefited. “It is unlikely that I would have become a full professor in Germany with my own institute at the age of 34 without having learned so much through PASCAL!”

Most recently in 2013 Peters was awarded the IEEE Robotics & Automation Society’s Early Career Award. In the future he aims to develop even faster and better methods and expand the use of the methods to a wider range of situations. He has ambitious visions for these approaches. If the learning techniques were exploited for industrial robots as used in the car manufacturing industry, the result might be robots that could perform many tasks a few times, in contrast to today’s robots, which can only perform a few tasks many times. Perhaps the idea would overcome the prohibitive cost of robot programming for existing industrial robots. “Imagine if a robot could learn some of the production steps from a factory worker as fast as our table tennis robot,” says Peters.

He is also actively working to help people with disabilities. A current project investigates Brain-Robot Interfaces that learn to adapt themselves to the patients and enable improved stroke therapy.

“I still dream of a third helping hand,” says Peters, “which in symbiosis with a person could help him accomplish his tasks.”