How AI Happens

Declarative ML with Ludwig Creator & Predibase CEO & Co-Founder Piero Molino

Episode Summary

Piero Molino, the CEO and Co-Founder at Predibase and the creator of Ludwig.ai discusses the various ways he applied AI at Uber, his background in machine learning, and leveraging AI to automate processes. We also unpack the benefits of configuration-based systems, democratizing data science tools, the ethical side of AI and data science, and why diverse teams are essential.

Episode Notes

Low-code platforms provide a powerful and efficient way to develop applications and drive digital transformation and are becoming popular tools for organizations. In today’s episode, we are joined by Piero Molino, the CEO, and Co-Founder at Predibase, a company revolutionizing the field of machine learning by pioneering a low-code declarative approach. Predibase empowers engineers and data scientists to effortlessly construct, enhance, and implement cutting-edge models, ranging from linear regressions to expansive language models, using a mere handful of code lines. Piero is intrigued by the convergence of diverse cultural interests and finds great fascination in exploring the intricate ties between knowledge, language, and learning. His approach involves seeking unconventional solutions to problems and embracing a multidisciplinary approach that allows him to acquire novel and varied knowledge while gaining fresh experiences. In our conversation, we talk about his professional career journey, developing Ludwig, and how this eventually developed into Predibase. 

Key Points From This Episode:

Tweetables:

“One thing that I am proud of is the fact that the architecture is very extensible and really easy to plug and play new data types or new models.” — @w4nderlus7 [0:14:02]

“We are doing a bunch of things at Predibase that build on top of Ludwig and make it available and easy to use for organizations in the cloud.” — @w4nderlus7 [0:19:23]

“I believe that in the teams that actually put machine learning into production, there should be a combination of different skill sets.” — @w4nderlus7 [0:23:04]

“What made it possible for me to do the things that I have done is constant curiosity.” — @w4nderlus7 [0:26:06]

Links Mentioned in Today’s Episode:

Piero Molino on LinkedIn

Piero Molino on Twitter

Predibase

Ludwig

Max-Planck-Institute

Loopr AI

Wittgenstein's Mistress

How AI Happens

Sama

Episode Transcription

Piero Molino  0:00  

I believe that in the teams that actually put machine learning in production, there should be also other skill sets that are even outside the realm of computer science. Then the important thing is, does the tool that I'm using support me in collaborating with people on understanding the models performance, it's something that isn't becoming, in my mind table steak.

 

Rob Stevenson  0:24  

Welcome to how AI happens. A podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers, as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson. And we're about to learn how AI happens. Here with me today on how AI happens is a man who has all kinds of experience in our space. He was a staff research scientist at Stanford, as well as a staff research scientist at Uber AI. Currently, he's the CEO and co founder over at predator base Piero Molino, welcome to the podcast. How the heck are you today?

 

Piero Molino  1:08  

Thank you very much for having me. It's gonna be like a fun, fun conversation. Looking forward to it.

 

Rob Stevenson  1:14  

Yeah, definitely. I've been looking forward to it all week, because there's so much to go into. Firstly, I want to make sure I didn't. I won't. I'm sure I didn't totally break your curriculum vitae there. Is there anything I left out as I was doing my intro?

 

Piero Molino  1:25  

There's a couple more things, but you get the most relevant ones. I will say, you know, he also worked with the smaller startup called geometric intelligence in the past that was funded by a bunch of people, including xuebing. Armani, for instance, that then was acquired by Uber to become very AI, and also spent a little time at IBM Watson when I was doing question answering and at Yahoo before, so a couple more experiences, I guess.

 

Rob Stevenson  1:48  

Right? That is worth calling out that you were with geometric until such time as they were scooped up by Uber. And then that was the opportunity to you're one of the founding engineers there for Uber AI. Is that correct? Yeah,

 

Piero Molino  1:59  

the team from geometric intelligence became the foundation team of Uber AI. And then we grew from there basically about 10 people team at the very beginning to 100 people team by the end, so it was a fun ride.

 

Rob Stevenson  2:10  

You know, if we had had this recording six years ago, you probably couldn't tell me anything about it. But now maybe you can. What was what was kind of your directive coming into Uber? And basically, they were like, Alright, now you like geometric intelligence is now our AI department have at what were you intended to do there?

 

Piero Molino  2:27  

Right, right, right. Something's happening already. doober, when we joined, one thing that was already happening was the ATG team, which is the self driving car team that actually I think in 2020, was sold to Aurora, if I remember correctly. But I mean, there was like 1000 people organization down in Pittsburgh, mostly. And we were doing basically everything else, all sorts of AI and machine learning applications and research. And specifically, what I was working on was helping different product teams, putting the latest and greatest machine learning research into production for specific use cases that were particularly important for Uber, I would say, in general, particularly important for Uber. Yeah,

 

Rob Stevenson  3:06  

right. Right. So it wasn't self-driving cars at the time, it was just sort of, can we operationalize AI for lots of different areas of the business?

 

Piero Molino  3:13  

Yes, yes, absolutely. And they can give you some examples, for instance. And also, many of these were collected in engineering blog posts around the time between 2017 and 2020, more or less. So if people want to double down on these topics, you know, there's plenty of resources there. So one of the interesting applications was customer support. So we put out this model, it was called the Kota, for supporting customer support representatives for suggesting them the classification of different tickets that were coming in, also, what actions to take about them, and how to respond to these requests, that was really, really useful reduces the time that representatives would take to answer to users and also improve the customer satisfaction. So it was a really nice application. Another one was the recommender system of UberEATS, where they collaborated to create a new and improved version that was using more recent techniques, in particular graph neural networks. There was another project that was about fraud prediction, in particular, detecting collusion between drivers and riders. And then, you know, the bigger one that I contributed to was this dialogue system for Uber drivers that allowed Uber drivers not to have to touch the phone while interacting with it on the ride, because there were some countries where that's actually illegal. And so we made it possible for for the drivers to interact by voice.

 

Rob Stevenson  4:31  

Right, right. illegal in some cases, and dangerous in any case, truly, right.

 

Piero Molino  4:35  

Dangerous. Any case. Yeah, definitely. Definitely. But again, you would expect them to ideally stop and answer that. But if then their livelihood depends on clicking fast, then obviously, that does not happen. Right. And so it was really important to make it so that they could do it for voice without having to look at the phone or touch the phone.

 

Rob Stevenson  4:52  

Right, right. Let's incentivize these drivers to operate the phone safely anyway. Right. Exactly. Yeah, it makes sense. Piro part of the reason I was excited to have you on is because As you have spent time in academia as well in the private sector, and it's really common in this space, not guaranteed, but you've come into it a good amount, having people with, you know, one foot in either camp. So after Uber, you moved on to Stanford, where you were doing some research in ML and NLP, I want to have you reflect on the nature of why that is, in our case in our industry, why people do both, but first, I would just love to hear a little bit about your work at Stanford.

 

Piero Molino  5:24  

Yeah. So the work at Stanford was, I would say, maybe to the one step back, I got introduced to Professor Chris Rock from Stanford with whom I worked when I was there. It happened in in a really fun way, I would say, he did the public shoutout about Looby, which is an open source library that I put out while I was working at Uber, during an event that was recorded at Stanford about the future of machine learning infrastructure. And he was on a panel with Jeff Dean. And so he was commenting about this new library called looting from Uber. That was super cool. And so I reached out to him and we started chatting, he invited me to give a talk and his lab at Stanford. And it was like a really fun way to get to know each other, right? And then when my time at doober ended, I decided, well, let's start with Chris because he also funded a bunch of other important startups in the machine learning space, some of them were sold, some of them are unicorns now, and so that he knows his way around. When I decided to join him and say, Well, let's think together. What is the next step here? What can we do together as a next step. And so I worked in his lab and supported the students, we put out a couple of papers about in particular benchmarking, because I think it was a really, it's a really important topic. But in the meantime, we were thinking about what we will do next. And so then the year after we started to start a company together to productionize, these ideas about declarative machine learning that were already in my open source project loading, but also in a project that he worked on while he was at Apple, that was called overtone, we figured out that these things work that Uber was any open source worked at Apple, and so probably, there was something valuable there to start a company about them.

 

Rob Stevenson  6:56  

Yeah, it's maybe worth spending some time with Ludwig. I feel like that name will ring a bell for some of the folks out there listening. But that kind of got you on Professor Ray's radar a little bit. And that was the seed from whence cometh proto base. So for those who maybe it didn't bring about, would you mind sharing just high level? What is Ludwig and then we can get into how it became MetaBase?

 

Piero Molino  7:14  

Yeah, absolutely. Super happy to talk about it. It's like a brainchild, right. So I'm always happy to talk about it. So basically, maybe to get a little bit of context for looting. So I was mentioning before some of these projects that I was working on whether it was Dubai, right. And so what I noticed is that in all of these projects, there were many, many things that were in common among these projects, and a few things that would make them substantially different data. And the scheme of the data was one of the things that make them different. But pre processing was the same for all of them, the training of the models was the same for all of them, the architectures were more or less the same for all of them. The evaluations were all the same, right. And so because I'm very lazy, and I don't want to do something more than twice the first time I need to do something, I usually try to automate it for myself, and for my mental health, honestly. And because of that, I looked at all these projects, I said, what isn't common among these, and I made it so that what was common became code. And what was not common became a configuration. So in the end, what began became Ludwig was my own personal system, where instead of having to rewrite every single aspect of a machine learning project, and both infrastructure and all of these aspects that I mentioned, I made it so that I could specify what was the schema of the data. And a few more details never had control over like, you know, the training parameters, the specific architectures to use and things like that, and that we don't have to implement anything else. And so every new machine learning project I had, I had to write 10 line configuration file, instead of writing potentially 1000s line of pytorch or TensorFlow code. And then basically, there are people in the company about that, first of all, that made it so that I could be substantially faster at starting a new project and getting to like good results very, very faster, very much faster than than most other people at the company at the time. And so that caught the attention of other engineers, and they started to say, Well, maybe you if we can use the same tool, also, we can be as effective as you are. And so more people started using it in the company that we're not applying it to the same task that was using it for but to their own business cases, right. That convinced me that there was value there to release it as an open source project, because other people in the open source community would have benefited from it, hopefully. And also because it was built on top of open source technologies itself. So you know, it was a way to give back to the community to begin with, right? And so that was the motivation behind it. What it is now it's really a project that is backed by the Linux Foundation. And the way it works is really, you provide the data that you want to train your machine learning model on and the configuration file and you can you get out like a train model that you can serve by a single commit and interact with it. And this configuration file is really easy to write in particular to get started with is just one line for each feature that you have in your data, and one line for each target that you have in your data and their data type. That's all you need to get started. But at the same time, there's more than an under different parameters, you can change or modify, including architectures pre processing, training, optimizers, all these aspects are like, just one string in a configuration file is more than an Android. So you have full control over what you are training, right. And that's what the core of the game is. And the reason why we call it declarative machine learning is because these configurations are declarative in nature, you're not telling imperatively what the set of operations or pytorch commands will say, You're not telling exactly the set of instructions by torch instructions that you need to do in order to create and train the model. But you just declare the schema with your data and off you go. Right. So it's much simpler.

 

Rob Stevenson  10:57  

Could you give an example of what declaring a schema data would look like?

 

Piero Molino  11:01  

Yeah. So for instance, for giving an example, I was mentioning at the very beginning quarter, the customer support model. And so in that case, the data set looked like a table with one column that was the actual text that the user would send. And then we had some additional information about the user. Like, for instance, if it was what app was it coming from? Was it coming from the rider app that the driver app or Uber Eats? And additional information about them? Like, how long ago did they join? For instance, another thing that was there was information about the trip, if it was referred to a trip, like how long was a trip? Was it paid or not? Was it canceled or not? And how expensive was it? And the goal was to predict what type of issue it was what action among a set of possible actions to do to resolve this ticket. And also among a set of templates, which one to use to answer. And configure glue, the configuration for this task looks like it's a Yamo file. And also you specify a list of input features. And this list contains one entry with the name, which is the name of the column in the data that contains the specific information and the type. So for instance, in this case, request type, text source, app type, category, trip length, type number, and so on, right? And then at least about what features where you say, well, they're basically the targets. And you can say, well, I want to predict the ticket type category, action type category, again, because this was a multiple set of them. And you could either generate the taxes are gonna say answer type text or answer type category, if you have like a set of predefined answers, right? This is how the configuration looks like to begin with. And it's very simple. But then you can say, also, oh, I want to train it using faces the tax, they want to use birth for including the text, or I want to use some other architecture. And I want to do the pre train or the one who's not pre trained, I want to fine tune it, I don't want to fine tune it. For the output, I want to use like cross entropy loss, right, or maybe I want to have a weighted cross entropy loss that increases the weights for some classes. And so all these things are, you can define them through the configuration. And then also training, you can say I want to train for this number of epochs with this learning rate. And I want to use Adam vi as optimizer because it's particularly good for fine tuning of pre trained models. And all of these aspects are just entries in this configuration file.

 

Rob Stevenson  13:15  

A minute ago, I wanted to call out quickly that you when you were talking about developing it yourself, you shared what I think is the outcomes razor of software development, which is everything the same became code and everything different became configuration, like that's just every single software products, right. Fez, a fantastically clean way of talking about it. So I wanted to call that out. I was tickled by that. But of course, Little League started as this open source project. And as with any good open source project, it starts as one thing, you open it up to the community, and it kind of blossoms into this other thing. So I'm curious, as you watched other people kind of put their fingerprints on it. Were there ways the community used literally that surprised you?

 

Piero Molino  13:54  

Yeah. So one thing that the project in which the project grew after I made it open source is that, at the beginning, the supported data types were relatively more limited. Now there's many more. And there is, for instance, images or audio, there's also geospatial data can also be used. So there's many more things that I was not like directly, adding the very beginning because they were not used in my specific use cases. But the I will say one thing that I'm proud of is the fact that the architecture is very extensible. And it's really easy to plug and play new data types, for instance, or new models. So it's just there are really lightweight wrappers and off you go, right. And so that made it possible for people to add their own stuff, and also to use it in terms of applications for like applications that was not even thinking about a couple ones that are particularly interesting. And again, we're really far from what I would have expected people to use the tool for. One was a research paper that actually was published on science by a researcher from the Max Planck Institute that was using loads big for analyzing biological images of worms. And don't ask me more about it. Because if I didn't know the details of it, but I know that it was like, he was not like a machine learning person, not even a computer scientist was a biologist. And still, the configuration file was so simple as a mechanism that he was able to do something like this without being an expert, right. And so I'm proud of the fact that a tool like this enabled people to do machine learning, but we're not like machine learning experts.

 

Rob Stevenson  15:27  

Is that kind of an accidental effect of the declarative approach? Or Was that intentional to put it in the hands of people who didn't have the technical background?

 

Piero Molino  15:35  

Right? So they were not very intentional in the sense that when I fought about it, I fought it for myself for making my life easier, right, then how easy it is, accidentally, it also makes it possible, I will say, I wouldn't expect many people who are not developers at all, to be comfortable with it, the specific person the specific user was and was super happy about it. But there was nothing we didn't expect it to be for most people. But I would say most developers that maybe are familiar with other configuration based systems, like I don't know, TerraForm, for infrastructure, for instance, right? I think those kinds of people definitely would be able to use wielding fruitfully at the very least to get started with maybe they will not know the specific value that they should set for like the dropout layer of the first whatever layer in the network, but for sure to get something out that works, they can definitely get that value out of out on the platform.

 

Rob Stevenson  16:28  

Certainly, yeah. Yeah. So that's forwarding then what role did Ludwig play in you founding credit base? At what point did you realize, okay, well, we got something here. We should maybe incorporate around this activity.

 

Piero Molino  16:38  

Yeah, it was a combination of things. I would say. On the one hand, the positive reaction in the open source was already something that convinced me that it was to begin with something valuable, right beyond because until you put stuff out, it's difficult to assess, in particularly for putting stuff out not as a product, but as a project. Right. That's how it started. Then the second piece was whether it was mentioning before with Chris, the fact that there was someone else, someone really smart, who had a really similar idea to mine, and that we could do together that was really, really convincing. And then seeing also other companies building on top of, of these ID electrons in this system internal at Mehta called looper. And in their paper, they cite our work, for instance, and describe a system that there are some differences but in spirit, and in approach, it's really really similar to loot MC two. And so again, all of these things together. Plus also having done some preliminary work and interviewing some of the heaviest looting users and companies that was were using loot big, you know, all these things together convinced me that it was worth doubling down on it.

 

Rob Stevenson  17:42  

Gotcha. Just a piece of trivia. I'm curious why Ludwig by the name.

 

Piero Molino  17:48  

It's a secret and they will never

 

Rob Stevenson  17:51  

it was your dog's name and MIT it

 

Piero Molino  17:54  

really, it's a longer story. It's like it's based on on Ludwig Wittgenstein, which is a philosopher the first half of the 20th century. And I have like a little presentation on YouTube where I basically connect the what what I believe are the dots between keys work on structuralism, and then work on in particular, natural language processing and embeddings. And now then language models, there's a few rules in my mind that connects all these things, right. And so because looting in particular, the beginning was particularly focused on NLP tasks. It is still about now there's also other modalities, but I felt that it was like a nice connection, but it says with oneness cure one that I'm sure not everybody will agree with, I guess.

 

Rob Stevenson  18:35  

Yeah, well, hey, I mean, if the name stuck, and it means something to you. So what else really matters? I don't know if you've if you're familiar with my only familiarity with concern comes from a novel called Wittgenstein's mistress. Do you ever read that one? I haven't read this one. What is the word? Oh, it's fascinating. It's sort of an experimental, weird novel. And it's about this woman who she thinks she might be the last one on Earth, but she's not sure. And so it's sort of like the unreliable narrator. There's some for someone like you, who knows what can Stein a little bit better than I do? Probably you could point out where it's kind of weaving into literally with consigns approach to life. But anyway, that's a book wreck for the people out there.

 

Piero Molino  19:09  

I would definitely check it out. Thank you for this suggestion. Yeah, do so.

 

Rob Stevenson  19:13  

In any case, that's where Louie came from. Thanks for sharing. And now we have credit base, you became clear to you that this was the response from the community that there was, you know, you had something on your hands here. Is it fair to call credit base, low code platform?

 

Piero Molino  19:26  

I will say so. Yeah. When I say declarative machine learning, for instance, or we're talking about the configuration based system, people really think about low code is Oh, yeah. Is it a low code thing as it was? Sure. Yeah, it is. And so that's why we started using that as a as a way to describe the company to write as a local AI platform now. We do it we're doing like a bunch of things it and pray the base that build on top of looting, right, and make it available basically and easy to use for organizations in the cloud. So we make it possible for management to connect with data from databases, data warehouses, or 100 stores all these sorts of things. We make it possible for teams of people to collaborate on developing rubric models and modifying configurations and lineages of configurations and comparing performance of the model string of these configurations. And then finally, to productionize them by making them available both as real time applications for REST endpoints, or for batch prediction like on top of large amounts of data optimized for throughput instead of optimized for latency. And all of this without having to think about the complexity of provisioning an infrastructure and the machines because it's kind of transparent to the user the way the machines are collected and used and spawned, and then turned off the moment that the tasks are completed, right. That's really the core of what we're building right now.

 

Rob Stevenson  20:46  

I have a sociological question for you in the spirit of Ludwig Wittgenstein. And it is this, this idea of low code platforms, democratizing access to some of these technologies, I think, at first blush is good, you don't want users to need to be privileged or have some sort of access to advanced upper echelons of academia in order to use some of this technology, which in the right hands is a magic wand, right? So you don't want to limit access in that way. However, should there be some kind of barrier to entry, so someone needs to have sufficient knowledge of statistics or data science, in order to play with this tool in a way that we can ensure is, is bias free is ethical, etc?

 

Piero Molino  21:27  

It's a really good point. And I will say there's multiple aspects to it that I believe are important, right? On one hand, it's a matter of education, right? So I think in a world where these tools become more and more available, and more and more valuable, I think it is starting to become very important, too. For everybody, for every developer, at the very least, to be able to have the basics of what's a validation set? What's the training set? What is like physical significance? How do we evaluate performance? How do we evaluate for bias and all these things, right? So there's an education aspect, then there is like a tooling aspect, I think tools should all provide the mechanism to at the very least investigate all these aspects, right. So for instance, I can tell you one of the things that we are doing, which is not like the only one possible, but for instance, we make it very, very easy for people to analyze performance, some of the data on slices of the data. So we have a mechanism that we call peak, which is kind of like an extension of SQL with predictive predicates, where you can say, select the data and run predictions on the selection of the data, right. And so if you have something like this, then it's very easy. If you have a data set with some sensible information, like for instance, gender information, or maybe the zip code of potential customers, things like that, to run evaluations and look at the performance on the different slices, because that will is going to tell you a lot of the bias of the model. And also, it's important to make it so that there's an understanding of what are the factors that impact for real the predictions of the models, in order to be able to mitigate them, weighing the platform have mechanisms to produce aggregate, and also per data point, kind of attributions or features, there's many more things that can be done. But I think all these capabilities now need to be table stakes, in particularly if we want to give these tools in the hands of people that are not trained as data scientists. The final aspect, in my mind with regards to this is that I believe that in the teams that actually put machine learning in production, there should be a combination of different skill sets. So there should be data science, skill sets, and development and you know, engineering skill sets. And depending on the type of company that have modelled, also other skill sets that are even outside the realm of computer science. And so then the important thing is, does the tool that I'm using support me in collaborating with people on understanding the models performance, and the models behavior? And so I believe that the tools should support teams in that, right, you take that to heart specifically, but I think it's something that is becoming in my mind table stakes, like not having it will mean that the models that you put in production are those are not the ones that are going to do the best for your company for your customers, right?

 

Rob Stevenson  24:12  

Who do you think ought to be at the table outside the discipline of computer science?

 

Piero Molino  24:17  

That's interesting. I think it really depends on the specifics, right? Just to make an example, if you're talking about house pricing, maybe urban planners should be at the table. Or maybe if you're talking about CD curriculum, like classification, or curriculum filtering, probably people from HR should be at the table. So I think it really depends. But the important thing is, I will say at the very least to have a multiple set of stakeholders that doesn't involve only developers, that's going to be key to have these models doing what we want them to do and not what we are just capable of analyzing the performance of right. It's not always the case that the raw performance of accuracy or Whatever or f1, or AUC, whatever it is, aligns with the actual thing that we want the model to do, and the business value that we want them all to have. And so the presence of these people will make sure, make sure it's a big word, I will say, will make much more likely that there is alignment.

 

Rob Stevenson  25:16  

Right? Right. So it would just be someone with expertise, or at the very least lived, lived experience in in the area or the domain you intend to impact, right, that sort of thing, which is, you know, outside the realm of AI in machine learning, it's really just the argument for building a diverse workforce, right, having multiple perspectives at the table mean that you can serve a better swath of the population in a way that doesn't discriminate against people based on background or various other social indicators. So it's not just AI and machine learning. This is just general company building in one way, right?

 

Piero Molino  25:49  

I completely agree with that. It's general software building is very company building is just representation society at large also,

 

Rob Stevenson  25:57  

well, Piero, we are creeping up on optimal podcast length here. But before I let you go, I want you to speak to all the machine learning engineers out there. What can people do to look around their company, make sure they're having an impact on the ML side and put themselves in the best position to advance in their careers?

 

Piero Molino  26:13  

That's an interesting question. I have a hard time like making it like a general statement. But it can speak to like my personal experience and abilities to write. And so it was a my personal experience is what made it possible for me to do the things that I have done is constant curiosity, first of all, of any technology, any new thing that comes up, and not being scared or threatened by whatever new technology comes up. And I think a really good example, here is language models, and actually embrace them and try to understand them deeply. So my personal asset has been always to go deep, and get deep understanding of the things that I was doing. And that was that was using with that deep understanding. You can reuse it always across multiple projects, multiple tasks, multiple domains, and multiple steps in your career. And so be curious, get really deep understanding and apply what you learn constantly do little projects, even if it's not like, even if you're not attached to it, but try to put what you learn into practice. That's what has worked for me at the very least.

 

Rob Stevenson  27:18  

That's great advice. Piero, thank you so much for being here and for walking me through all this stuff. I've loved chatting with you today.

 

Piero Molino  27:23  

Thank you very much for having me wrong. This was a fun conversation.

 

Rob Stevenson  27:28  

How AI happens is brought to you by sama. Sama provides accurate data for ambitious AI, specializing in image video and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, e commerce, media, medtech, robotics, and agriculture. For more information, head to sama.com