How AI Happens

AMD Senior Director of AI Software Ian Ferreira

Episode Summary

Ian shares how AMD partnerships are making powerful models available to the general public and large tech companies alike. Also, will LLMs disrupt search? Clippy is back, and he's BAD.

Episode Notes

Sama 2023 ML Pulse Report

ML Pulse Report: How AI Happens Live Webinar

AMD's Advancing AI Event

Our guest today is Ian Ferreira,  the Chief Product Officer for Artificial Intelligence over at Core Scientific until they were purchased by his current employer Advanced Micro Devices, AMD, where he is now the Senior Director of AI Software. In our conversation, we talk about when in his career he shifted his focus to AI, his thoughts on the nobility of ChatGPT and applications beyond advertising for AI, and he touches on the scary aspect of Large Language Models (LLMs). We explore the possibility of replacing our standard conceptions of search, how he conceptualizes his role at AMD, and Ian shares his insights and thoughts on the “Arms Race for GPUs”. Be sure not to miss out on this episode as Ian shares valuable insights from his perspective as the Senior Director of AI Software at AMD. 

Key Points From This Episode:

Quotes:

“It’s just remarkable, the potential of AI —and now I’m fully in it and I think it’s a game-changer.” — @Ianfe [0:03:41]

“There are significantly more noble applications than advertising for AI and ChatGPT was great in that it put a face on AI for a lot of people who couldn’t really get their heads wrapped around [AI].” — @Ianfe [0:04:25]

“An LLM allows you to have a natural conversation with the search agent, so to speak.” — @Ianfe [0:09:21]

“All our stuff is open-sourced. AMD has a strong ethos, both in open-source and in partnerships. We don’t compete with our customers, and so being open allows you to go and look at all our code and make sure that whatever you are going to deploy is something you’ve looked at.” — @Ianfe [0:12:15]

Links Mentioned in Today’s Episode:

Advancing AI Event

Ian Ferreira on LinkedIn

Ian Ferreira on X

AMD

AMD Software Stack

Hugging Face

Allen Institute

Open AI

How AI Happens

Sama

Episode Transcription

Rob Stevenson  00:00

It's sort of just like the next generation of Microsoft Clippy.

 

Ian Ferreira  00:04

Somebody that spent a while at Microsoft, I have to say yes, it probably is. cliffie is coming back, and he's gonna be bad.

 

Rob Stevenson  00:14

Welcome to how AI happens. A podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers, as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson. And we're about to learn how AI happens. Hello, again, all of you wonderful how AI happens listeners out there in podcast land. It's me, your buddy Rob here with another classic installment of the show. But before we get into today's episode, some quick housekeeping to share with all of you two pieces of news to share really piece of news number one is that our friends over at sama recently published the 2023 edition of the machine learning pulse report, where they surveyed 1000s of practitioners in our space, about the role of generative in their work, how they're measuring model effectiveness and how they expect it to impact their work in computer vision. The report is a great way to kind of benchmark your own experience, maybe a little bit of a gut check to know that my peers are facing some similar challenges I am when it comes to measuring success. And also, here's how they plan on dealing with it. Definitely check it out. There's a link to the report in the show notes. But that's not all piece of news number two is related. And that is that we are hosting a live webinar edition of how AI happens on Tuesday, November 14. to unpack the report, we have assembled a panel including the Chief Science Officer over at voxel 51. Jason Corso, who also by the way happens to be a professor of robotics and Electrical Engineering and Computer Science at the University of Michigan Go Blue. And we're also going to be joined by the SVP of AI products and technology at sama one Duncan Curtis, they're both really smart, really technical guys. And they're going to solve some of the problems surface by folks in the report. They're going to share how they anticipate generative is going to impact our space. And then also, taking all y'all is burning questions. So if you get a chance to check out the report beforehand, anything that sticks out to you, we're going to make sure we save some time for y'all to interact with the speaker. So would be a really good chance to get some advice from the folks out there who are really doing this at a very high very technical level. If you have enjoyed what we're doing here on the podcast, the webinar next week is going to be a great way to engage, like I said, to have some alive back and forth with the pros. And of course hang out with me. We're going to just pack a ton of information into the webinar and we're going to have some fun doing it. So the link to sign up again in the show notes. It is completely free, obviously. So I'll see you there. Okay, now on with the show. Today's guest on how AI happens is someone I am really excited about he has been around the block a few times in our space. He was the Chief Product Officer for artificial intelligence over at core scientific until such a time as I believe they were purchased by his current employer, AMD, where he is now the Senior Director of AI software. Ian Ferreira. Welcome to the podcast. How are you today?

 

Ian Ferreira  03:35

Good, good, good. Good to be here.

 

Rob Stevenson  03:37

I'm so pleased you're here and that you went the distance and going so far as to steal your son's video game huts. Because that was the best audio option. You had to snatch it out of his hands and say, No Gran Turismo buddy dead.

 

Ian Ferreira  03:50

That's exactly right. The Agony on his face because he was in the middle of a game, but you know, we do what we do. 

 

Rob Stevenson  03:58

Sorry, but if you want me to keep waiting for that PS live, I go. Gotcha, too. Good. I shout out to your son for being understanding. And anyway, I'm really, really glad to have you here. And did I do your curriculum vitae, justice or anything else from your, your background? Maybe you can add to sort of set some context here? 

 

Ian Ferreira  04:18

Absolutely. So as you mentioned, I joined AMD, just over a year ago as part of an acquisition. And I've been in the ML infrastructure space in the startup circuit, specifically for about eight years. And I spent just a couple of months shy of a decade at Microsoft before that. I'm originally from South Africa. So that explains the weird accent. And right now I lead the AI software solutions team at AMD. And our primary goal really is working with customers to onboard their workloads on AMD instinct GPUs.

 

Rob Stevenson  04:55

So your career has kind of taken a somewhat from million trajectory in terms of how you wound up getting to AI, you had been a software engineer sort of climbing the ladder program manager, data science, all these various skill sets. At what point in your career did AI become your focus?

 

Ian Ferreira  05:15

That's a great question. I like actually, a lot of people in this space started in the machine learning area, specifically in ad tech. So you would notice a lot of the people in AI have an ad tech background. And arguably, that was the first large machine learning, production workload and search, which was click prediction. And a lot of the big data technologies came out of ad tech. So I've been adjacent to it. Pretty much my entire career, it was used for something much less noble than Chat GPT and serving ads. But we at least developed some of the technology. I think my focus specifically on what today is called AI, is when I left Microsoft and joined the framework team that were training large models on accelerator software stack. And the pitch that guy gave me when he lured me away from Microsoft was something that used to take 24 hours in a data center cannot be trained on your laptop. So it sounds very compelling. It turns out it was just an early version of Spark. But you know, it got me going. And it's just remarkable the potential of AI and so now I'm fully in it. I think it's a game changer.

 

Rob Stevenson  06:29

A moment ago, you said that ad tech is much less noble than Chat GPT, which for me begs the question is Chad GPT noble?

 

Ian Ferreira  06:37

A good one? I think it's more so well, maybe not Chat GPT. I mean, let's put that aside. I think the idea of serving ads and the amount of technology that goes into just picking the right ad is almost embarrassing in a certain extent. And if I could do things over, I probably would have switched careers sooner. But ad tech was great for me. I wrote the ad tech wagon all the way from South Africa, through New York, into Microsoft in and around Bing. But you know, there are significantly more Nobel applications in advertising for AI and chat GPT was great in that it just, it put a face on AI for a lot of people that couldn't really get their heads wrapped around. Okay, so what is this AI? And how is it real, but actually having a conversation with something that's not a person and passing the Turing test? I think that changed the game. So I think from here on, we're going to see a lot of changes in how we work and live based on our large language models.

 

Rob Stevenson  07:38

Do you think that sort of demonstration to the masses, let's call it of what AI kind of is and something that people can wrap their head around? Do you think that is perhaps more meaningful than just the possibilities with LM M's or chat GPT specifically? Or is it you know, just different sorts of applications?

 

Ian Ferreira  07:56

No, I think it was the Euro app, for sure for AI to date. And that created a lot of investment. That combined with the study that Microsoft did around co pilot and 50% productivity. So your imagine if you're a CTO or CIO, you're thinking, wow, I can shave 50% of my r&d budget by using code Gen, or co pilot, I think that's a game changer. I think the actual use case of L ends will develop and evolve. I use chat GPT a lot more. In the beginning, I think it will weave its way into products more seamlessly, instead of being a destination where you say, Okay, I'm gonna go to Chat GPT, I'm going to ask you to do something, it'll be woven into your email client and to your presentation client. And so I think that makes it more natural than a destination where you have to go and do things. But at the same time, you know, where at what was it 10 trillion parameters, the next one will probably be 100 trillion. And I think that's when you'll start seeing really scary reasoning capabilities coming out of these large language models. Scary how I think just scary, unpredictable. I'm still blown away that you know, if you think about model architecture, behind transformers and chat GPT, it's a very, you know, arguably very simple model. And the idea that something that essentially builds a massive graph between words and sequences can actually semi-reason and sound intelligent. It's quite remarkable. I don't know, if I should be depressed as a human that I can be mirrored so easily or impressed by the capabilities, but I think it's definitely mind boggling. The capabilities that come out of that transformer network architecture. Are we really just a bunch of connections, right? 

 

Rob Stevenson  09:50

Yeah, I mean, a bunch is like a lot, but a finite number of turns. Right, right. Right. Right. Exactly. So the copilot trend, I think is pretty well established that we're going to see, like an LLM sort of assistance in lots of different sorts of softwares. What about the possibility for Chat GPT, or whatever its next iteration is next competitor to replace our standard conceptions of search. 

 

Ian Ferreira  10:17

That's a good point. And so you know, just you look back at some of the tech defining moments of our civilization, and they want to get too philosophical. But the industrial age, the invention of machines changed a lot of the blue collar lives, right, and how things are made and in factories, are now in the digital transformation era in AI is in the middle of that. And I think it's going to be impacting more of the white collar, workplace as much as industrial that blue collar. And I think the typical knowledge worker careers are going to be very impacted, I think it's going to start with augmentation. But a lot of what we do as humans is pattern recognition is researching. So if you kind of build that forward to search, I don't think that the search index is going to go away, I think that there's still place for structured data. And not everything has to be weights in a large transformer network, or a mixture of experts networks. You know, I use the analogy even as a person, I can give you a document to read. And then I can ask you questions after that, and you can reason on it. So in a way, it's like pre prompting, or rag retriever augmented generation, versus having that trained into the model a priori. But I think there's definitely going to be a lot of use cases of augmenting enterprise search or taking data that's locked into relational database managers into other enterprise search products. And just using an LLM to put a almost a friend lever near around it so that I can reason on data that before I need to know how to write a SQL query or how to work SharePoint search and you know, a lot of skill with searches really typing the right keywords. But an LLM allows you to have a natural conversation with the search agent so to speak.

 

Rob Stevenson  12:08

Yeah, definitely. It's sort of just like the next generation of Microsoft Clippy.

 

Ian Ferreira  12:14

Somebody that spent a while at Microsoft, I have to say yes, it probably is. CLIPPY is coming back. And he's gonna be bad.

 

Rob Stevenson  12:21

Don't Don't. Don't call it a comeback. Exactly. flippies origin story is you swearing at him and acting out every time you open up Microsoft. Did you work on clipping at all he and it'd be so nice of you so tickled if you did 

 

Ian Ferreira  12:37

know and it was after Clippy, but I did use it as a as an end user. I remember the font that every time you print something, its eyes will go into a little box like it will really cool animation. I was really rooting for Cortana. Personally, I wish Microsoft will bring Cortana back as the face of GPT instead of Clippy. But any event. It was a good time. I had a lot of fun working there. 

 

Rob Stevenson  13:02

Yeah, good, good. Well, we sort of jumped in in the deep end here end, because you know, we're getting a little carried away. But there's nothing wrong with that. However, I do want to make sure that we learn a little bit about what you're working on right now. I guess first of all, when I think of AMD, I think of hardware, I think of GPUs, you're the director of software. So I guess I would just love to know a little bit about how you conceptualize your role. 

 

Ian Ferreira  13:26

Yeah, so my role is really working with customers on moving their workloads onto AMD infrastructure. And so that's a hardware and a software story. And I think we have a very differentiated product in both categories. So if you look at the hardware side, AMD has a broad portfolio, you know, we have really killer CPUs, with risin and epic. We have GPUs with instinct. We have gaming GPUs were they on. And then with the acquisition of Xilinx, we have FPGA, so it has a broad sweep of products from cloud to edge or edge to cloud, whichever way you want to go. And so that makes a lot of end to end workloads and scenarios possible. But what really makes the difference, speaking as a software engineer is the software stack. That's ultimately where the rubber hits the road where hardware becomes something useful is in the software layer. And I'm not just saying that as a software guy, but that's definitely the milestone that you want to achieve. With AMD software stack. The team did a great job in making it very interoperable with existing workloads. So instead of saying, Hey, we're gonna make it different. It's very similar to how you might use data scientists might use the pytorch that TensorFlow there Jack's today, so there's no additional work. There's no oh, now I have to change it to run on AMD or not to change it to run on the other guys. And that makes a huge difference. The other big differentiator, it's open. So all our stuff is open source, AMD As a strong ethos, both in open source and in partnerships, we don't compete with our customers. And so being open allows you to go and look at all our code and make sure that whatever you're going to deploy is something that you've looked at, we talked about how easy it is to take workloads that you've run on pytorch, or TensorFlow, or Jackson, just run them on AMD. That makes a big difference, especially if you're the IT manager, and you're thinking about adding AMD GPUs to your cluster. You don't want to have your customers yell at you and say, Oh, my stuff's not running, blah, blah, blah. So from that point of view, the team did a great job. A little higher up this actually we talked about the frameworks TensorFlow pytorch, Jax, we have a great partnership with hugging, face hugging face in did a phenomenal job in democratizing transformers and diffusion models. And so we've integrated with them over 60,000 models, our CI CD to basically validated and cue died every night. So that's a super big set of their models of the 62,000 60,000 are running on AMD. And that makes it possible for somebody to go into hugging face, copy, paste the code example and plug it into a notebook that's running on AMD. And it'll just work. And I have evidence of this because I showed a salesperson how to do that. And they were able to do it. So there you go. And then we also recently did the acquisition of not that AI. So really smart team, a lot of experience in low level graph compilers. And so we expect them to use their expertise to make our software story even more performant to make it even more adaptable to our own different hardware endpoints. So lots of good stuff happening in the software space. And that, you know, I think that's going to be important for AMD going forward.

 

Rob Stevenson  16:51

It seems to me that GPUs having them at all right, and then being able to kind of rent access to them is sort of an arms race is sort of a negative connotation. But it does feel like that that's kind of what's going on. Would you agree with that characterization?

 

Ian Ferreira  17:08

Yeah, I think this is one of the things that happened with ll M's, right. So the amount of GPUs needed pre the LLM phase where hundreds or 10s of GPUs, and in a year would be like the king of the hill, if you had 16 GPUs. And then our love AMS came out, and all of a sudden, the scale went to 1000s and 10s of 1000s of GPU. So that put a lot of pressure on on the supply chain, you know, it's people are buying enormous amounts of GPUs to train these models. So I think that made a big difference. I'm still blown away at the size of some of these clusters that are used to train these models. You know, it's just, it's mind boggling how big these clusters are the amount of networking technologies that's involved to get these, you know, 1000s of GPUs to work together, it's a pretty amazing feat.

 

Rob Stevenson  18:00

Is that strictly necessary? Because what I tend to hear, particularly from data scientists is use the right data, it doesn't necessarily need to be like, the more the better, let's try and be a little more efficient. When you see some of these huge processes, right, with all these clusters on utilizing however many hundreds 1000s of GPUs, is that always necessary? Is there a bloated nature to some of that, I

 

Ian Ferreira  18:23

think it's more of a case of separating the workloads and use cases for the end customer. So if you're in the foundational model game, right, so if you're one of these few companies that train these large models from scratch, you actually don't have a choice, you're stuck with having to have clusters of 10s of 1000s of GPUs, especially when you're going north of the 100 billion mark. But if you're an average enterprise, and you're looking to what's called fine tuned models, you take one of these foundational models, and then you tune it for your specific domain, then yes, of course, you don't need the same size as the pre training ask. But there's a paper that shows that provided you have enough data, right, because you can just have compute without data. If you have enough data, you can keep growing the model and it will get better. And so that's where there's quote unquote, arms races, which is who's going to be the first person to do a 10 trillion and then 100 trillion model, because we really don't know what's going to happen in terms of reasoning capabilities at that next tranche and at the next tranche. So I think those type of customers will need GPUs in the 10s of 1000s. The average customer that's doing fine tuning and their enterprise or doing enterprise search, they will probably have a different scale point.

 

Rob Stevenson  19:42

Are you allowed to say which customers those are that are doing those huge processes or how about it just like an industry?

 

Ian Ferreira  19:48

the foundational model experts, right so open AI is that Microsoft everybody that you see talking about creating their own LLM. So mosaic for example, created an LLM to upset the billion, we're actually working on a project with Alan Institute, they're creating a model. Right now they're doing a 7 billion parameter model on one of our partners supercomputers in Finland called Lumi. And the next tranche is a 70 billion parameter model. But again, this is still small when you think about what Microsoft did or open AI did with their 1 trillion parameter model. And so I don't know maybe charge GPT, five is 10. Maybe Chad TPD. Six is 100 trillion, I don't know. But it's, it gets up there. And it's going to be interesting to see the capabilities of these models. Once you get to those many nodes. 

 

Rob Stevenson  20:39

I'm glad you brought up the Allen Institute, because you have this partnership, where you are sort of providing your own open source LLM here, where you are taking advantage of AMD is open, open source approach. You mentioned that a little earlier, too, I would love to hear you share a little bit more about that and what that entails, how customers are using it and why that decision was even made. 

 

Ian Ferreira  21:01

Right. So we typically have a history around working with researchers in the scientific domain, both, you know, on the traditional HPC side with the national labs, etc. And so this was a partnership with Allen Institute. And the key differentiator is everything is open. Right. So the licensing model of the model is open, it's not restricted. The data that was used to train the model, usually that's kept proprietary that's also open. It's called Dolma, do LMA and the model is called Alma LM O, and so both of these are open the source code actually the code to train the model is also open source. So I think the intent is just to give the scientific community is fully open source solution trained on AMD with some models that they can start with and then fine tune. So if you want to start with a seven B model, and fine tune it, or early next year, the 70 B model and fine tune it, you can do that. And you don't have to worry about some licensing clause buried somewhere in a T's and C's in how you use it. The data sets they're using are 3 trillion tokens. So it's pretty sizable. And it's funny when we started this project, we had scoped it to a specific size. But then the industry is moving so fast. It's like well, 2 trillion tokens. Isn't that cool anymore, you need 3 trillion tokens. And so as we're going along training these models, we have to kind of pivot and adjust to make sure that it's still state of the art when we come out the other end. But what's remarkable is it's all happening on AMD infrastructure.

 

Rob Stevenson  22:35

When you talk about the need for trillions of tokens, or 1000s, hundreds of 1000s of GPUs, it makes me worried that there's sort of this privilege to be able to use that many GPUs like you have to have some pretty serious resourcing to be able to do that sort of processing. Is there any mind being paid to the folks who certainly cannot afford, you know, to rent that many GPUs are to use that sort of infrastructure, but still have worthwhile projects? Yeah.

 

Ian Ferreira  23:04

So all my project is one attempt at this to make a pretty sizable and pretty powerful model available to the general public or whoever wants to use it. But I think the meta point is true, there is a haves and have nots of 100 1000s of GPUs. And to make it even more complicated. There's this aspect of countries wanting their own sovereign models, right? So we're seeing a lot of countries in the Europe wanting to train their own models in their own country. And so now they're trying to find out well, where am I going to train these large models? Where do I have GPUs in the 10s of 1000s? And I think that's going to continue, because if this all pans out, and you end up with something that's so powerful, do you really want to have a dependency on another country or one specific company to provide that service. So we're seeing a lot of sovereign language models sign up. There's obviously the case of models trained on specific languages. So for example, in Finland, on that same Lumi, supercomputer, the terrco, NLP university, they trained a Finnish model that was set in billion parameters. But that's just one example. Right? You're not going to find a finished model, unless you're in Finland, most likely, and you're not going to find Japanese models, unless you're training it for the Japanese market. And so just pure by language, you're gonna get some segmentation. And then because of the sovereign risks, I think you're going to have models being trained in country

 

Rob Stevenson  24:29

so that governments will have to fire up their own, like native versions of AMD, basically, right? 

 

Ian Ferreira  24:35

They have to find some infrastructure. And so that's in Europe, they basically using the supercomputing centers to train sovereign models right now. And I think that will continue.

 

Rob Stevenson  24:44

Can you explain what you mean by sovereign model?

 

Ian Ferreira  24:47

It's basically a model that's trained in country for that country and using the country's data. So there's no import or IP issues if you will, as the IP moves across. country borders. That's a data governance, if you will, could be another way to describe that. 

 

Rob Stevenson  25:05

Gotcha. Isn't that the case, too, with any like sufficiently large organization now that they would be sort of defending and then a little bit hesitant about handing over data? Exactly, exactly. I guess higher stakes, though, if you're a government, I suppose. 

 

Ian Ferreira  25:19

Exactly. And the you know, the one of the funny things, and which is why it's important to see the data is what data goes into this reasoning engine? And does the people that created that data have any claims to the capability of that reasoning engine? So there's a lot of those conversations happening around generative AI as well. So knowing what data was used to train the model makes that a non issue. 

 

Rob Stevenson  25:44

Right, right. That's Explainable AI know exactly. Yeah. Well, Ian, we are creeping up on optimal podcast length here. This has been a blast. Before I let you go, though, I wanted to ask about this event that AMD has coming up here in the Bay Area early December. Would you mind sharing a bit about that?

 

Ian Ferreira  26:00

Yes. Dr. Lisa, Sue is going to be talking about some new product announcements. It's called the advancing AI event. It's December 6, in the Bay Area. I'm really excited not just as an AMD employee, but as a, as an AI practitioner, I think the partners and customers that the world will see will be impressive. And I really, really look forward to it. I think it's going to be phenomenal. I'm super excited about it. 

 

Rob Stevenson  26:26

So I'm sure you can't spoil too much. But is there anything you can share about the nature of these announcements?

 

Ian Ferreira  26:34

No. I'm trying, I think the attendees and audience will be impressed with the progress AMD has made, I think, unfortunately, a lot of the progress we've made was in forums that we couldn't talk about and and so being able to actually come on, and show and talk about the work we've done to date, and you know, be able to openly talk about it. I think that's for me is the biggest excitement. 

 

Rob Stevenson  27:00

Yeah, and just kind of what I can tell from the website and signup page, it looks like you're gonna be telling some stories from some of the cool things customers are doing and sort of getting under the hood and seeing some of the cool work going on over there. So yeah, I will be tuning in for sure. I believe there's a live stream. So I'll try and put some kind of link in the show notes for people that can bookmark it, they can come back when it's time in early December for that announcement. Until then, Ian, I gotta say this was really interesting. So thanks for coming on and chatting up with me and getting a little philosophical. I really do. Appreciate that. Great having you here.

 

Ian Ferreira  27:30

Fantastic. Thanks so much, Rob.

 

Rob Stevenson  27:33

How AI happens is brought to you by Sama. Sama provides accurate data for ambitious AI specializing in image video and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, ecommerce, media, med tech, robotics and agriculture. More information, head to Sama.com