How AI Happens

Google DeepMind Research Director Dr. Martin Riedmiller

Episode Summary

Martin is a former University Professor and renowned Research Scientist at Google DeepMind, whose work focuses on advancing the field of artificial intelligence through deep reinforcement learning. His work continues to push the boundaries of AI capabilities, making him a leading figure in the quest to build AI systems that can learn and adapt to complex environments. In our conversation, we discuss what reinforcement learning does differently in executing complex tasks, overcoming feedback loops in reinforcement learning, the pitfalls of typical agent-based learning methods, and how being a robotic soccer champion exposed the value of deep learning. We unpack the advantages of deep learning over modeling agent approaches, how finding a solution can inspire a solution in an unrelated field, and why he is currently focusing on data efficiency. Gain insights into the trade-offs between exploration and exploitation, how Google DeepMind is leveraging large language models for data efficiency, the potential risk of using large language models, and much more. Tune in now!

Episode Notes

Martin shares what reinforcement learning does differently in executing complex tasks, overcoming feedback loops in reinforcement learning, the pitfalls of typical agent-based learning methods, and how being a robotic soccer champion exposed the value of deep learning. We unpack the advantages of deep learning over modeling agent approaches, how finding a solution can inspire a solution in an unrelated field, and why he is currently focusing on data efficiency. Gain insights into the trade-offs between exploration and exploitation, how Google DeepMind is leveraging large language models for data efficiency, the potential risk of using large language models, and much more.

 

Key Points From This Episode:

Quotes:

“You really want to go all the way down to learn the direct connections to actions only via learning [for training AI].” — Martin Riedmiller [0:07:55]

“I think engineers often work with analogies or things that they have learned from different [projects].” — Martin Riedmiller [0:11:16]

“[With reinforcement learning], you are spending the precious real robots time only on things that you don’t know and not on the things you probably already know.” — Martin Riedmiller [0:17:04]

“We have not achieved AGI (Artificial General Intelligence) until we have removed the human completely out of the loop.” — Martin Riedmiller [0:21:42]

Links Mentioned in Today’s Episode:

Martin Riedmiller

Martin Riedmiller on LinkedIn

Google DeepMind

RoboCup

How AI Happens

Sama

Episode Transcription

Martin Riedmiller  0:00  

So ideally in the long term, which but I won't predict how long this will take. In research terms, the human will get out of the loop, and the role of the engineer that set up the RL system gets less and less

 

Martin Riedmiller  0:13  

Welcome to how AI happens a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson, and we're about to learn how AI happens. Welcome back everyone to another instant classic installment of how AI happens. I'm so sure already and we haven't even begun, or we've only just begun, however you want to call it, but in any case, here with me today is a former computer science professor at universities of Dortmund, Osnabruck and Freiburg, a longtime leader in machine learning research, particularly in supervised and unsupervised learning, as well as self learning agents, self driving, cars, robotics, nuclear fusion, you name it really. Currently he is over at Google DeepMind, where he serves as a research scientist and lead of the controls team. Dr Martin. Hued Miller Martin, welcome to the podcast. How the heck are you today?

 

Martin Riedmiller  1:22  

Thanks for this kind introduction, Rob. I'm happy to be here.

 

Martin Riedmiller  1:25  

Was the introduction? Did I cover it? Did I leave anything out? Any important curriculum vitae?  

 

Martin Riedmiller  1:30  

No, I think that you covered it. Very well.

 

Martin Riedmiller  1:32  

No, great. How about that? We're off to a roaring start here. Martin, you have such a rich background and experience in this space, and so there's a lot of different directions we could go. And because it's a technical show, you know, at some point we're going to speak about magnetic control of tokamak plasmas through deep reinforcement learning. However, I kind of want to hit you with some hard hitting questions right out of the gate. Is that okay?

 

Martin Riedmiller  1:56  

That's fine, absolutely.

 

Martin Riedmiller  1:58  

So you worked on the five times winner of the Robocop World Championship, robotic soccer team Brainstormers. I'll never have Jose Mourinho on my podcast, but this is maybe as close as I will get. Can you tell us about the experience of being five time robot World Cup winner?

 

Martin Riedmiller  2:16  

Yes, of course, I'm happy to talk about it. So it was a stony way. So we were close to win for a couple of times, the runner up. And from that experience, I know how hard it is to really work hard and then not finally win, but only become second. Because everybody asks you, Why didn't you win? Nobody asked you, how hard did you work to get there at all? So overall, these 10 years that I spend in RoboCup was a really good experience, because in science, very often you're judged by the work that you're doing scientifically, and your colleague find it more or less interesting. And then you get in and out of conferences with the competition. It's a very clear thing, either you win the game, you shoot more goals than your opponent, or you're lost. And so it was a very interesting, different kind of competition than I was used as a scientist. But still, we tried, of course, to bring our scientific methods in to win that competitions.

 

Martin Riedmiller  3:10  

You know, I'm half English, and so as an England football fan, I'm well accustomed to coming in second, at least in the last few years, so I understand your pain a little bit there, when you are training the team here to play robot soccer, how much of it is just you are creating some kind of efficient manner for them to play, are you taking into account the opponents at all? I'm just so curious how you train the team to be good at the sport, and because my suspicion is that, you know, Reinforcement learning is reinforcement learning and just the application to football is maybe not as important. Is that the case?

 

Martin Riedmiller  3:45  

I think for us, reinforcement learning was finally the key to success. So there are a lot of things in robotic soccer that are difficult to achieve with classical methods, like programming it out because we were participating both in simulation league in pure simulation and in the real world. And, for example, in simulation, it was very close to physics. So you, for example, to kick the ball hard. You only could give it impulses, and only kick it a couple of times to bring it to the right direction. And figuring out this was actually very difficult by classical methods, but reinforcement learning finally nailed it, and for sometimes, we had the strongest kick in the league that was learned by a method and automatically that we couldn't come up with. And the same thing was true in the real world with dribbling with the robots. So it was very difficult to keep the ball close to the robot, and tons of students were trying to come up with a good classical routine, but the best routine, finally, was the reinforcement learning agent that learned to keep the ball very close and had very good turns and that let us win a 2007 competition in the midsize League.  

 

Rob Stevenson  4:53  

Okay, so it sounds like the shot power and dribbling was sort of the key feature that was unlocked with Re. Enforcement learning versus like classical methods. Could you kind of explain why you suspect that's the case, why that was not possible previously,

 

Martin Riedmiller  5:07  

both of them did hard shooting in simulation and the dribbling in the real world. They were a couple of feedback policies that need a lot of fine tuning, like you have to really see what happens in the world, like how fast the ball is currently, already accelerated in the shooting example, and then giving the direct contact in the right angle. And like, I think humans are just not very good in kind of figuring out this feedback loop by programming it out completely by hand. And the reinforcement learning agent was basically patient. It trained until it finally find the right control law to achieve its goal, and the goal was given to shoot as hard as possible in the pre specified direction, and it just trained and trained and trained until we were satisfied. So that is an advantage over the hand tuned methods.

 

Martin Riedmiller  5:55  

There's too many variables when it comes to like, when you say, for example, measuring the speed at which the ball is arriving at the receiving agent. Like, is that just one example of like, okay, this is something that humans would need to do calculations for a ton of different speeds. Is that the issue is too manual?

 

Martin Riedmiller  6:12  

Yeah, exactly. That would be it. So one thing in classical control theory is that you come up with a mathematical model of how this actually behaves, and then solve it by control development, control development methods. But there's a lot of effort going into that, in particular, if the word is non linear, or if there are multiple inputs that you have considered at the same time, like the speed, the position of the ball, probably the rotation of the ball. And advantage of a learning system is that if you can just take the senses, everything that you observe, and try to make the best out of this, extract the information that you need, then you're in a much better and comfortable position.  

 

Rob Stevenson  6:51  

Got it makes sense. So you won it five times. And then was it sort of like, okay, let's I'll move on to the next thing. Or what made you decide, okay, we've, like, been there, done that,

 

Martin Riedmiller  7:01  

yeah, exactly was, like you said. We won it five times. We finally proved that we can also win it and not only become runner up. And we found a lot of interesting scientific questions that came from being part of that Robocop community. So we decided to move on and look for new challenges in that field.

 

Martin Riedmiller  7:20  

Pray tell them, Martin, what are some of the new challenges?

 

Martin Riedmiller  7:22  

So one of the things that was fascinating at that time is in RoboCup, we spent a lot of time having different components, like really understanding where the ball is, and going into, like, deeply into classical computer vision to find out where this orange part in the images and to separate it. And at that time, deep learning became a thing like to really kind of have a learning system that understands from raw signals what it has to do. So instead of also taking care of this perception part and putting a lot of effort into manually figuring out where the ball is, we were interested in ways of working more like a human. So you only have the your sensors. They give some raw measurements. And you really want to go all the way down to learn the direct connections to actions only via learning. And this was an inspiration, clearly, from RoboCup, where we had all these individual systems, which was the state of the art at that time, and to go more to these deep learning system where you wanted to learn much more like a human, you have some sensors as inputs, and you want to understand all these relationships internally, instead by learning, instead of programming everything out.

 

Rob Stevenson  8:32  

Yeah, it's a really interesting analogy to stick on like the receiving a ball moving at a certain speed, and then sending it on from you at another speed, like an athlete, of a professional football player, or any athlete really is doing that calculation like instantaneously in their brain because of the million times they've done it before, in training, right, in actual practice, on the field, and it's not something you do consciously. It's just a reaction, but the reaction is based on data, frankly, right? But this is all happening subconsciously. So I don't know when we say, like, the way that humans learn, we are not aware of this fantastically complicated physics problem, right? Essentially, that we are undertaking. But the machine is obviously, or you are in training. The machine  

 

Martin Riedmiller  9:18  

Absolutely, and that's exactly also what you described it very nicely. What also fascinates me in soccer, if you play a pass, you can just play it because you trained it so often, and what it's needed, the angle of your foot, how fast the ball is approaching, how fast your teammate is accelerating at that time, and you're just doing it subconsciously and seamlessly. Of course, there's a lot of training involved. You need a couple of years until all this motion system, this muscle system, is working, but sometimes it's so mature that you can achieve basically any task that a human body can achieve, more or less seamlessly.

 

Martin Riedmiller  9:52  

So Were there particular use cases you wanted to pursue, given this the inspiration here, or was the takeaway just. More of like an advance in reinforcement learning writ large.

 

Martin Riedmiller  10:03  

So my idea was always like, I was interested in this close control, this sensor motor control loop, like that's needed for passing in soccer, but that is also probably needed in autonomous cars, where you have cameras as inputs and you want to accelerate the motor and decide on the steering angle, for example, to solve a certain problem, or, like balancing a pole on the hand. And I always had the dream of like doing it more in a human like manner. So by just having the goal given, and then making the system explore the possibilities and getting better and better at it, so that finally it can come up with a solution that, in most of the time, also surprises the engineer that set up the learning system.  

 

Rob Stevenson  10:46  

You know, what's so interesting, Martin, is that you were solving for this closed system, which is the rules of the game of soccer, but it's sufficiently advanced that in solving for it, you create all these other cross applicable things that feels like the case in this space. You know, when I speak with people, it feels like whatever industry they are on, they have these similar challenges that they are facing, and that to unlock a solution in like med tech, for example, would be to unlock a solution in a completely unrelated industry. I don't really see that in other spaces beyond AI ml, do you feel like it's unique to the space?

 

Martin Riedmiller  11:23  

I'm not sure whether it's unique to the space. I think engineers often work with analogies or things that they learn from different things, like, for example, making an AirPlay fly. You probably have at least the inspiration from birds, and understand this more and more. So I think it's probably a general principle. But what is probably true in machine learning is that we kind of believe that this is a very general tool. Learning is a very general thing, and you can it, of course, by definition, nearly applied to a lot of different areas, so where necessarily, as you said, we're necessarily looking for the commonalities between different domains while keeping the general solution method as constant or as standardized as possible?  

 

Rob Stevenson  12:07  

Sure. Yeah. So you know that focus of yours, I wanted to ask you how that fits into deep mind's purview, because, given that you it is Google DeepMind, there's all these resources available to you. Strikes me that you could maybe choose to do research on just about anything. So how do you kind of decide where to spend your time and energy performing research?  

 

Martin Riedmiller  12:27  

Yeah, I think that's a very good question. So we have an overall mission at DeepMind to kind of understand intelligence and bring it to the world. And the particular mission of my team is to bring AI to these control problems, to the sensor motor control problems, which is as you want, the low level control that's also built in humans. And this is a mission that we decided upon from the beginning. And this is kind of our leading mission. That's kind of always true. And we are in particular interested in looking for data efficient methods, so agents that can learn from very few data, as we also see humans as being very data efficient. So once they have learned a certain task, they do not spend a lot of time to repeat it over and over, but they go to the next task and try to figure out their next capabilities. And that's something I also finally want to see in these artificial agents. And then with the daily how this breaks down into daily life. Of course, there are a lot of fantastic colleagues at DeepMind. We discuss ideas of how to get there, we read papers, we go to conferences, we exchange with colleagues. So this is very much like the normal academic procedure of exchanging ideas, and with probably also a bit of the difference compared to academia, we have a lot of people that stay there permanently, so they are already in their postdoc area, and they don't leave the team after three or four years. So there's a bit more of a long term perspective that you can achieve with your research.

 

Rob Stevenson  13:57  

Yeah. The other thing that's unique about this space is how common it is for people to work private sector as well as have one foot in academia so that they can conduct research. And it feels like you are already doing that, you know, like you kind of live on both sides of the fence. Your grass is always green. Here is that the case?  

 

Martin Riedmiller  14:14  

I would say. So, yeah, for me, at least, the grass is very green. That is also always something that when I joined academia, I was very much interested in doing research, and doing research in a team, and then having an impact on the real world. That I always wanted to see a reinforcement learning controller, sometimes somewhere, being applied in the real world and having an effect. And then when I got the chance to join DeepMind, this dream basically became true, to actually pursue this general research question of how to make AI happen with very ambitious people in a very powerful environment. So, yeah, the grass is still green on my side.

 

Martin Riedmiller  14:52  

I love it. It's exciting. And I love this. For you, Martin, and I'm curious. You know, we've kind of weaved our way around here, but what are you working on right now? What's kind. Of taking up your time when you go to open your laptop and do the capital W work of this research.

 

Martin Riedmiller  15:05  

So currently, we still have this mission of data efficient reinforcement learning, and there's a lot of work that we are currently doing is about understanding how to actually do this. And we have a large program of understanding how to make the best out of the data that we have available, like offline reinforcement learning, for example, and improving those methods. But what I'm currently really excited about is understanding, if you have the choice how to collect this data, actually, what experiment should the robot do next? So that when it gets this data, it will really make progress on its abilities. So being a bit more on the exploration part, and these large language models that have now been around for quite some time, they offer an interesting opportunity to bring this knowledge a bit more to the agent and to do this more autonomously. And that's kind of the direction that we are currently exploring.

 

Rob Stevenson  15:59  

So the idea would be for the tech to query an LLM itself in order to solicit more data.  

 

Martin Riedmiller  16:05  

Yeah, or, for example, the classical reinforcement learning that has more or less the paradigm that you apply the current policy that the agent has learned so far and then give a bit of noise to the policy to collect more data. And then the progress is, sometimes it's faster, sometimes it's slower, but for example, if you're already pretty good at doing some things, then repeating the same policy over and over is probably not the best thing to do. But if you would understand that, for example, for inserting an object into something, you do not learn again how to grasp it, because you already know how to do this. But it's really about the insertion. It's about bringing it there and then figuring out how to insert an object in another object. And if you have this understanding from the outside, then data collection could be much more efficient, because you don't have to go back all the way and start to learn to grasp that object, because you know you already can do this. And these large language models, they could be their understanding of the world. They could be a way to kind of understand this and then also tell the agent where to explore next and where to collect the data, so that it's if you're going to a real robot, so you're spending the precious real robots time only on things that you don't know and not on the things that you probably already know.

 

Martin Riedmiller  17:20  

So I would love it if you could explain more about that last part, about how to train it to where to look and what to look for.

 

Martin Riedmiller  17:28  

So for example, we had a paper a couple of years ago or two years ago about like bringing an object or stacking an object on another object. And then an LLM was asked how this is usually done, and it was pretty nice, or pretty good in kind of understanding that first it has to grasp the object, then it has to bring it closer to the other object, and then it had to let it loose. And with that information, if you know that, then you can say, okay, the overall task of stacking two objects has three different parts. And first I probably concentrate on the first part, like I learned to grasp objects. But once I've done with that, I concentrate then on the second part and then on the final part. And so instead of always kind of applying the stacking policy over and over and starting from scratch all the time, you would probably only say, I know how to grasp it. I know how to bring it to the other object. So I now only concentrate on the final on the stacking part, on letting the object go when I'm close to the target object. And then an LLM helped us to understand these three parts. And then with that idea, you could also kind of design the curriculum better and more efficient.

 

Martin Riedmiller  18:40  

So in this case, is it palm two? What is the LLM?

 

Martin Riedmiller  18:43  

I don't know, actually, which VLM we were actually using concretely. It was one of the early models. Then

 

Rob Stevenson  18:49  

got it. My fear would be, okay, you are sort of placing some trust in the LLM that is collecting data in an accurate way, that is presenting information in a meaningful way. So is that a concern for you? Is like, Okay, what is even if you train it to seek out the right, correct kind of data, what is it finding?  

 

Martin Riedmiller  19:05  

So you mean that the LLM might be wrong in understanding the task, or,  

 

Rob Stevenson  19:10  

sure, yeah, or adlm is wrong, or it's just feeding itself bad data,  

 

Martin Riedmiller  19:14  

yeah, I think then setting up this connection between the lower level that actually learns the policy and this higher level, that once the policy is learned and has this higher understanding of how the world works, that is still an open question, and so I think your concerns are totally valid. This is something that we definitely need to explore more. Once we have understood this one directional way of going from an LLM to the lower level, we also have to understand how the experience that we have then is reflected back to the LLM but this is future work to come.

 

Martin Riedmiller  19:45  

Yeah, yeah. I would love to ask you a little bit about this future LLM work. It feels like the need with llms is going to be less of the first L you know what I mean, like they're gonna need to be more domain specific, more just. Just smaller, because the larger they are, the less control. There is, less understanding. There is but the data that's there and it also just could be inefficient, right? It's like, you don't need chat GBT being trained on all of this stuff to give information on, you know, stacking objects, for example, like 99 point something, percent of that is not relevant. Do you see that as the case? Do you see llms getting smaller and more specific for a given use case?

 

Martin Riedmiller  20:19  

I think you have an absolutely valid argument. I hope actually that they are getting smaller and more specific, but I don't have a definite answer, because the other way could also be that just the fact that these llms are so large, they have a big understanding and they developed some very cool features in between, by learning about the world, by answering questions about arbitrary things that are then very useful for solving robotics tasks, because they have a big understanding. So it might also be that the dream of having small and specific models is not possible because they are only achieve their power after they have achieved a certain complexity, because that's important to have very good features. So I think there is definitely work going on in this direction, and this is absolutely needed. What is the right data curation so that we see this effect of having a very good generalization and an understanding of the world, but still, as I said, being specific enough, so it's that you can bring all the knowledge that you need for a certain domain, like robotics, and neglecting all the other stuff that is probably just even contra productive to solving tasks in that domain

 

Martin Riedmiller  21:26  

has reinforcement learning advances, as in this example of the tool, being able to seek out its own training data and without so much human input. What do you think is the role for the AI practitioner as reinforcement learning advances. Do you see the role of the human being shrinking? Like, what is the human in the loop? Capacity of this as the tech advances?

 

Martin Riedmiller  21:49  

So I always keep saying, We haven't achieved AGI until we have removed the human completely out of the loop. Because when you look at humans, which are the prototype of an intelligent systems, there's, of course, there's interaction with other humans, but finally, every single individual has to find their own way of learning and develop their own personality. And I think that is, for me, kind of one of the ultimate goals for AGI to be, kind of not depending on a human in the loop that gives the right examples, or that kind of steers the curriculum, but really understanding the whole process of what is the next step that the artificial agent should take, what is the next stability it should acquire, and then building on them to master a certain domain that it is set up to master. So ideally in the long term, which, but I won't predict how long this will take, in research terms, the human will get out of the loop, and the role of the engineer that set up the RL system gets less and less

 

Martin Riedmiller  22:48  

as we crawl towards AGI it feels like part of what's responsible for each crawling step is a higher order sort of cognitive process that a machine is able to do. And this is, I think, the example of, like, the receiving a ball and shooting a ball, and the soccer robot is a good one, the other example, and rather the training that necessary for that, right, when you were saying, oh, like, there's just too many variables for humans to write it out in, you know, in physical notation, right in, like, as a math problem, but machines are very good at it. So I'm curious in your research, what are you seeing in terms of the order of tasks that we can expect machines to continue to disrupt? I feel like there's this kind of baseline, very basic sort of administrative things machines are good at. What do you think is next on this ladder of higher order cognitive processes that machines are going to disrupt?

 

Martin Riedmiller  23:37  

I'm very cautious with predictions. So I think these large language models have shown that they are really good in some capabilities that we always thought are very difficult to achieve, like understanding language, or kind of summarizing texts, or writing texts even, or generating images. I think this is really a very amazing capacity capability, and I haven't seen it coming, like five years ago, that this will be possible in, actually, in the near future. And therefore, I'm very cautious with predictions. What I hope on my side with the learning for control, is that we can kind of push the boundaries to get to more complex systems, like having larger action spaces that we control, that are then finally needed if you want to control a robot to actually do some manufacturing tasks, then you really need a very well coordinated motion of a robot arm and the gripper to actually do these things. And I think there's still very interesting research lying in front of us to understand how we can extend the horizon of things that these robots can achieve, and also like the degrees of freedom that this agent can actually handle to fulfill these tasks. But I hope that now we have something that this has made a lot of advances on the higher cognitive levels, if we understand the relationships to connect this higher cognitive levels with a control level of cognition, and understand this relation. A bit better. We can also make more progress on the lower on the control side of things. And so that's what, at least what I hoped to get out of this for my area of expertise and interest for the next couple of years.  

 

Rob Stevenson  25:12  

You're very wise to not make a prognostication, but the twinkle in your eye tells me that you may be closer than you're letting on. So here, as we creep up on optimal podcast length, Martin, I don't think we're gonna find a better way. Find a better way to end it than you kind of summing up what you feel like maybe next. So thank you so much for being here. This has been really, really fascinating, hearing about all of your work and what excites you. So thanks so much for being here and sharing your experience and wisdom with

 

Martin Riedmiller  25:33  

me. Yeah, thanks much for these very interesting questions. I really enjoyed being in that podcast.

 

Rob Stevenson  25:41  

How AI happens is brought to you by Sama. Sama provides accurate data for ambitious AI, specializing in image, video and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail e commerce, media, medtech, robotics and agriculture. For more information, head to sama.com you.