How AI Happens

Meta VP of AI Research Joelle Pineau

Episode Summary

Joelle explains how she ended up at Meta and what her role as VP of AI Research entails before telling us what fascinated her most about AI robotics, why elegance is a key factor in her work, how asking the right questions opens the right doors, and the FRESCO philosophy that forms the foundation off of which Joelle makes all her business decisions.

Episode Notes

Joelle further discusses the relationship between her work, AI, and the end users of her products as well as her summation of information modalities, world models versus word models, and the role of responsibility in the current high-stakes of technology development.

Key Points From This Episode:

Joelle Pineau's professional background and how she ended up at Meta.
The aspects of AI robotics that fascinate her the most.
Why elegance is an important element in Joelle's machine learning systems.
How asking the right question is the most vital part of research and how to get better at it.
FRESCO: how Joelle chooses which projects to work on.
The relationship between her work, AI, and the end users of her final products.
What success looks like for her and her team at Meta.
World models versus word models and her summation of information modalities.
What Joelle thinks about responsibility in the current high-stakes of technology development.

Quotes:

“Perhaps, the most important thing in research is asking the right question.” — @jpineau1 [0:05:10]

“My role isn't to set the problems for [the research team], it's to set the conditions for them to be successful.” — @jpineau1 [0:07:29]

“If we're going to push for state-of-the-art on the scientific and engineering aspects, we must push for state-of-the-art in terms of social responsibility.” — @jpineau1 [0:20:26]

Links Mentioned in Today’s Episode:

Joelle Pineau on LinkedIn

Joelle Pineau on X

Episode Transcription

Joelle Pineau 0:00

We've had to build up some of our own touch sensors to essentially access what I call the pixels of touch by either electromagnetic signals or pressure signals that we can use to build up that knowledge of the world.

Rob Stevenson 0:14

Welcome to how AI happens, a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson. And we're about to learn how AI happens. All right. Hello, again. All of you wonderful folks out there in podcast land. Welcome back to how AI happens. I have a wonderfully exciting guest on the show today. I know I say that every week but this time I really mean it. I got to hear her speak a little bit at Meadows fair, 10 year anniversary event back late last year, has some amazing things to say and I'm really excited to follow up on some of it today. She is the VP of AI research over at Mehta. Joelle Pineau. Welcome to the podcast. How are you today?

Joelle Pineau 1:09

Hello, hello, great to be here. I'm doing well.

Rob Stevenson 1:12

I am so pleased to have you there's a million ways we can go because your name is all over tons of research, you have all these patents and all this amazing experience that we could get into. But I would hate to hamstring it by attempting to describe it myself. So I would love it if you wouldn't mind sharing a little bit about your background and how you wound up in this position at meta?

Joelle Pineau 1:31

Absolutely, You know, I'm a researcher at heart. I love to explore new questions. And so I sort of got into research first through building robots during my PhD days at Carnegie Mellon University in Pittsburgh. And this is the path that sort of took me into AI and machine learning in particular, I was really motivated to figure out how do we build decision making algorithms for robots. And that opened up an incredibly rich world. I spent many years as a university professor at McGill University in Montreal, where I'm still based, but about seven years ago or so, I got very curious about figuring out how we could do large scale machine learning, and to do it in industry with all of the resources and the amazing talent that that provides. So I joined Mehta, the fair team, fundamental AI research team at a meta, where I've been for now almost seven years.

Rob Stevenson 2:25

What was it about decision making inside robots that really piqued your curiosity?

Joelle Pineau 2:31

In particular, it was that challenge of using information. So observations sensor information, bringing that in, to take good decisions, I think there was an elegance to search into planning algorithms that appealed to me, it seemed like a such a universal problem to solve, you know, such a long tradition in games as well of using planning algorithms at the time. And you know, we have to think back early 2000s, at the time, we were really in the space of planning and thinking a lot less in terms of machine learning. But I got quite frustrated trying to do that in domains where we don't have a perfect model. And so we're not able to sort of roll out the set of actions that we need to take. And so that's really what drew me into the world of machine learning. They need to learn the dynamics of the systems, the consequences of the actions, causal models, in order to have a better planners. And that's where I sort of made the jump from planning to learning. And today, you know, we do try to do both the sort of close the loop build learning models, but that you can roll out forward so that you can actually do proper decision making and planning.

Rob Stevenson 3:40

I'm tickled Joelle, that you use the word elegance, because that suggests that there's some sort of accuracy or harmony that you're moving towards with decision making processes, which it probably doesn't always feel elegant when you are at your keyboard hammering away, and then, you know, trial and error and sometimes made me feel as like you're banging your head against the wall. But I was hoping you could speak more about you know, we're getting into the philosophic nature of this right at the top here. But why elegance?

Joelle Pineau 4:03

That's an interesting one. I've never really thought in those terms formally. But there is something there. To some degree. I don't know how many people are familiar with Occam's razor, as sort of a guiding principle, you know, when you can have both like a simple system and a very complex system to do the same thing. Why bother yourself with all that complexity that the simpler system has, has an elegance, it may come from symmetry, it may come from the other types of underlying structure. And I think that's what I mean by elegance, this notion that there's a structure to the problem. That is sort of the essence. And so the minimum set of information that you need to capture.

Rob Stevenson 4:45

Yeah, that's really well put. I've heard something similar from folks on the show where they tell me about how much time they spend speaking about the problem and really honing in as specific and simple. Put it in Auckland's razors term right as you can Before you start throwing some of this fantastically complex technology and algorithms at it, is that kind of what you mean is honing in on a specific question before you do a really long, long answer.

Joelle Pineau 5:11

That is absolutely part of it. You know, I have often said that perhaps the most important thing in research is asking the right question, posing a question at the right time posing the question in the right way, based on the knowledge that we have based on the resources, we have to answer the question. If you get that, right, you're halfway there. If you pose the wrong problem, you are on a long road, that can be very painful. So you know, in my days of supporting and mentoring PhD students, that was always the turning point for them early on, I would really frame and pose the problem in in the later years, it was really up to them to frame and pose the right problem. And then you know, that I knew they were there were set.

Rob Stevenson 5:57

How do you coach someone to hone in on the right problem?

Joelle Pineau 6:00

It's difficult. And I think, you know, there's a good portion of intuition in research, but that intuition has to take roots in knowledge and experience. So building up enough experience of the fields, building up your knowledge of basic techniques, but also you're observing cases of what works and what doesn't work. And through that experience, I think you'll gain a lot. But at some point, you have to trust your intuition as well, because we are in a space where the space of possible hypothesis is huge. And if you're going to go through all the hypothesis sort of systematically, you're probably not going to get there first. And so using a mix of knowledge, expertise, and intuition, is probably the best way to guide our discovery process.

Rob Stevenson 6:47

Yeah, the scope of possible hypotheses being so vast, is interesting. And it's something I wanted to ask you about, because you have this background in robotics, you have this PhD, you're a professor, and you have this role at meta where you can kind of work in anything you want, that feels sort of like the the the mandate of fair is like, let's get some fantastically talented people in this room and say, you know, have at it, I'm sure there are more guidelines in that. So I'm curious, one for you personally, how do you decide what to work on, because in your position, I feel like you could maybe work on anything?

Joelle Pineau 7:20

Well, at this stage, my role at Fair is actually to support the lab as a whole. And so I have a few 100 researchers working in the lab, many of them senior, some of them extremely experienced. And so my role in that case isn't, of course, to set the problems for them, it's really to set the conditions for them to be successful. And so I do two things. One is, and this will seem a little bit odd, but I have a very strong commitment to a set of values, we have what we call our fresco, you know, freedom, responsibility, excellence, scale collaboration, and openness, six core values that underlie how we do our work. And in many ways, I'm the custodian of these values, when we preserve these conditions, you have an incredibly special place to do research that is quite unique. The second thing I do is try to really connect to the work that we are doing towards the long term. And so you know, each researcher may have a project or a portfolio of project, but I have to find a path for all of these pieces to fit together such that we are on our way to solving AI. And, you know, I think we are we are unapologetic about our ambition to really solve AI. And so I have to provide some guidance and little nudges and you know, ask the probing question, so that we make sure that we have all the right technical capabilities, and that these technical capabilities have a path to connect together fit the pieces of the puzzle together, such that we can really bring together this notion of general intelligence.

Rob Stevenson 9:01

The obvious FOB was going to be weird. Do those pads convergent? As Interviewer You You did my favorite thing, which is you answered the question before it can ask it, but it converges at solving AI, which, you know, like you say, fantastically ambitious. What would that mean as output for metas? Say, you know, let's assume you succeed, what does that look like to someone like me on the outside?

Joelle Pineau 9:21

If I have to bring it down to one thing that I think is very important, I think it's this notion of building a world model. So think of it as a digital twin for both our digital world and our physical world. And that model is able to take in any type of information, whether it's text speech, images, and is able to generate new information, and is able to take inputs so people can use that and control the behavior of that model. So it's a pretty general model. It's known under lots of other terms, you know, in control and robotics, they may call it a statesman A model or transition model and reinforcement learning, but the essence is this notion of having a predictive model that takes an information and essentially can predict the future condition on inputs or actions. Once you have that world model, you can use it to do anything you want. You can generate data, you can make decisions, you can plan, you can predict. And so if I had to bring it down to one, one artifact that will encapsulate all of that, this is what it looks like. Now, that being said, you know, if you're talking what does success look like for meta that's a little bit abstract for some of my colleagues who are looking out for the health of the business and of course, have wonderful products that we are also building. And so I do partner very closely with applied research teams with product teams in house and we look for the best ways that we can take the technical components that we are building, and bringing them into those products. We're not quite at the stage of having rich world models. We're at the stage of word models, such as llama, for example, which is a set of models that we built this year. And so we build these models, we partner with various teams across the company, we adapt them to their specific use cases and collaboration. And we're looking nowadays at more constrained pieces. But really, that's the trajectory, these models getting incredibly more general, as well as incredibly more controllable on the path to general intelligence,

Rob Stevenson 11:33

controllable by the user are controllable in terms of oversight, accountability, both?

Joelle Pineau 11:38

both controllable by the user, controllable by communities of users, that we are able to bring in the goals and the values of the ecosystem and the society in which they're deployed. And so sometimes it's one user, sometimes it's a whole community of users.

Rob Stevenson 11:55

When you say world versus word models, I assume that means a preference is a move toward world models. And that would be to say non language based, like without text is sort of the the foundation is that right?

Joelle Pineau 12:07

It's not non-language, but it's beyond language. So right now, our models are very good at taking in language and outputting. Language, we're starting to see some cases that handle a few other modalities, usually images, but there's so much more information to digest as well as to produce. And so I expect language to be a big part of World models. But to open it up to many more types of information and modalities.

Rob Stevenson 12:33

the many more types of information and modalities is an exciting place to spend some time because when you think about how learning takes place, think of how much of it is not language based, there's this monkey see monkey do quality to a lot of learning and and you think of how children and infants learn without language, before they have the faculties of language, or even how any sort of animal learns without what we would consider language. There's some huge piece of learning missing here. Is that kind of what you're attacking when you think of more information in modalities and what would they be?

Joelle Pineau 13:03

That is definitely a part of what we are attacking. You know, for example, building models of video prediction is, I think, a really interesting problem right now, we have the ability to analyze images quite well to understand the content of images, both to do the generation of new images, or to do the segmentation of objects and images. But to do that, over videos, I think is quite challenging. There's challenges in terms of building up the right representation, and the right loss function, and so on, so forth. But we're still a far far away from from what humans can do in terms of even just understanding information and video. So I think that's definitely one. There's also a lot of information that comes from the physical world that is different from the digital world. So most of our models today are trained from data that's acquired from the web, that's a particular distribution of data, whether it's video, text, images, sound, music, you know, most of it is web based data. But if you get data in the real world, you actually get a very different distribution of data. And so we've started some efforts, collecting data from some people who are wearing smart glasses, and looking at, you know, what are the type of activities that people do in that setting? What are the types of scenes that they see what are the types of information that we may want to predict? So the modality is about the bits and bytes that you're getting in, but it's also about what type of experiences that you are capturing and our experience in the digital world is different than our experience in the physical world?

Rob Stevenson 14:38

Yeah, that's fascinating. So the idea is like others the context of being in the physical world and being in three dimensions for one and taking in modalities such as smell, taste, or even just the experience of being in a physical space very different than reading on a web page. This is the idea this is the other section are not the final frontier and learning but the additional frontier Yeah?

Joelle Pineau 14:59

We're definitely pushing that frontier. And in some cases, you know, it's not obvious even how to record that information. So we've had to build up some of our own touch sensors, these digits, sensors that we've built up to essentially access what I call the pixels of touch. And so very high resolution information that comes from signals either electromagnetic signals or pressure signals that we can use to build up that knowledge of the world.

Rob Stevenson 15:27

Pixels of touch that's really poetic, that should be the title of your book, it's at least the title of this podcast episode. Isn't that information recorded? Still as language, though? Or maybe numbers? I guess?

Joelle Pineau 15:38

Numbers? Definitely, yes. You know, we do live in a world where at this point, all of our machine learning algorithms are running on digital computers. So bits and bytes, absolutely, that is the language of all these machine learning, but not necessarily words in the sense of having like a discrete semantic, meaning for each token. So when we talk about language, you take a big document, you break it up into small pieces. Usually words are the root of a word, which we call tokens. And this is how we record a lot of the text information. That's also how we record for example, when we build representation of code, but it's not necessarily how we build up the representation for touch at this point.

Rob Stevenson 16:22

That's sort of how our brain works, right? Like there's language to describe non language based cognition or processes, right, we are destined to fall short to try to use language to explain what's going on in our brains. Right? It's the best approximation.

Joelle Pineau 16:38

Yeah, it's a fascinating question. Because to some degree, there's so much that we don't know about how our brain represents information. We have some information about the perception layer, but but then how to do that connection to the to the cognitive layer, and really the semantic of the information. We've done some work, in fact, trying to decode information at the neural level to give us some clues. The jury's still out on whether because the human brain does it one way or the machines should do it the same way. We haven't eliminated that hypothesis. But that's not always the case. You know, we've we've built up airplanes who don't exactly fly like birds fly, and that's okay. You know, we're happy to take the commercial airlines rather than like, hook up on a on a flying bird. So there's gonna be some amount of understanding of the human brain and that will be useful. But I don't know that we want to necessarily put all of our eggs in that basket, in terms of building general intelligence.

Rob Stevenson 17:36

Certainly, yeah. Yeah. Makes sense. Well, Joel, it sounds like part of why you got into this space, you had this experience being technical or being curious, working in robotics. And maybe you knew this all along. But it feels to me like the opportunity of AI has exploded, and alongside it, the responsibility of people working within the space. So I would be curious to hear how you think about that? What is the responsibility of really, really developing technology with perhaps higher stakes than we're used to seeing? And how do you think the people out there working in this space ought to consider their own responsibility?

Joelle Pineau 18:11

That's a vast question, you know, when that will probably keep me reflecting for the rest of my career. I will say, a couple of thoughts on this went first. Certainly the size of the opportunity for III wasn't obvious. When I first started in the field, I was very much driven to it purely by curiosity. I was someone who enjoyed mathematics, and I found myself enjoying programming. And there were some real world problems that you could solve once you had enough of tools between mathematics and computing. But from the very earliest days, I always enjoyed going back and forth between the theory and the practice. So I love building mathematical models. For example, we talked a little bit about decision making building new algorithms to take decisions at a very theoretical and mathematical level. But I enjoy taking these algorithms and putting them on robots, and seeing how well they perform in the real world when one of my projects during my PhD was to build a nursing assistant robot with a team of people. And I was in charge of the planning algorithm and the dialogue algorithm. And I had many wonderful colleagues who helped me build the whole system. But we took this robot into nursing homes to assist elderly users and have conversations with them. And so that social responsibility of the technology that you're bringing into people's lives is something that I started thinking about early on. And I was guided through that, of course, with my mentors and advisors at the time, who were very cognizant of the need to build a multidisciplinary, diverse team to do that work. And so that experience has stayed with me throughout my career. I've done a lot of work on applying AI for health care, working with practitioners who are dealing with people's real illnesses, people suffering who genuinely want to be build solutions that work not just in theory, but in practice. And so that experience has been nourishing me informing me in. So today when we build AI systems at scale, whether it's for a meta product, or whether it's to open source, there's always always this view that we have to build technology. But our responsibility is not purely engineering and scientific, our responsibility is social. And if we're going to push the state of the art on the scientific and engineering aspects, we must push the state of the art in terms of the social responsibility. So we've innovated by building up new benchmarks for safety by developing new watermarking methods by digging into privacy considerations, which is absolutely necessary, as we're progressing with these new large models.

Rob Stevenson 20:53

It's a great answer at the end of an episode full of great answers, Joe, well, you really, really set the standard here. This was exciting getting to chat with you. So thank you for being here. And for your candor and all of your your knowledge has been a true delight and having you on the show today.

Joelle Pineau 21:06

My pleasure. Thanks for having me.

Rob Stevenson 21:10

How AI happens is brought to you by sama. Sama provides accurate data for ambitious AI specializing in image video and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, e-commerce, media, med tech, robotics and agriculture. For more information, head to Sama.com