How AI Happens

LiveX Chief AI Officer, President, & Co-Founder Jia Li

Episode Summary

Jia shares her background as an adjunct professor at Stanford University as well as her journey to Co-Founding and serving as President & Chief AI Officer at LiveX, as well as why she is so excited about the field of AI agents and the crucial import of a multi-modal approach.

Episode Notes

Jia  shares the kinds of AI courses she teaches at Stanford, how students are receiving machine learning education, and the impact of AI agents, as well as understanding technical boundaries, being realistic about the limitations of AI agents, and the importance of interdisciplinary collaboration. We also delve into how Jia prioritizes latency at LiveX before finding out how machine learning has changed the way people interact with agents; both human and AI. 

Key Points From This Episode:

Quotes:

“[The field of AI] is advancing so fast every day.” — Jia Li [0:03:05]

“It is very important to have more sharing and collaboration within the [AI field].” — Jia Li [0:12:40]

“Having an efficient algorithm [and] having efficient hardware and software optimization is really valuable.” — Jia Li [0:14:42]

Links Mentioned in Today’s Episode:

Jia Li on LinkedIn

LiveX AI

How AI Happens

Sama

Episode Transcription

Jia Li  0:00  

Once you can close the loop of multimodal AI agents, the type of data you generate may be different.

 

Rob Stevenson  0:10  

Welcome to how AI happens. A podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson. And we're about to learn how AI happens. Oh right, Hello, all of you a wonderful folks out there in podcast land. It's me Rob back again with another instant classic installment of how I happens. I'm so sure. Where do I even begin with my awesome guest today she has such an amazing curriculum vitae. She was the head of Research at Snap. She was the founding global head of r&d and a co-founder of cloud AI and ML at Google, a chief AI Fellow at Accenture, she co-instructed the inaugural course of generative AI into Medicine at Stanford University. Currently, she is serving as the co-founder, Chief AI officer, and President for LiveX AI, Jia Li. Welcome to the podcast. I'm so pleased you're here today. How are you?

 

Jia Li  1:24  

Thank you, Rob. It's an honor to be on this podcast and share my perspective about AI.

 

Rob Stevenson  1:31  

I'm not being polite when I say the honor is truly all mine. I'm a clown with a microphone, and you have this amazing, rich background. But yeah, would you mind sharing a little bit about the courses you periodically duck into Stanford to teach?  

 

Jia Li  1:43  

So of course, everything is about AI. Right. So I started to teach at Stanford, back in 2018, when I was leading Google's enterprise AI, and I got super fascinated about how AI is making an impact in different verticals and our everyday life. I started to teach as an adjunct professor, and several classes in AI. The most recent was obviously about Jiang AI, it is the most difficult class to teach, mostly because this field is advancing so fast every day. Yeah. So nowadays, I'll be invited to go to events at Stanford or teach one class to share what is the latest Gen AI? And hopefully, the experience about practicing AI and innovating AI in the real world scenario, can inspire more students at the university.

 

Rob Stevenson  2:51  

Yeah, that's interesting that obviously, the narrative moves so quickly. And it's the case for the eager young minds of tomorrow as it is for the professionals in the space. So you said you kind of you go in there, and you can tell them, what is the latest? Is it about updating them on kind of what is coming out? Are you also trying to give them the tools to sort of stay abreast of the space, mostly?

 

Jia Li  3:11  

about sharing perspectives. Obviously, the students are super hands-on themselves, like even those from the business school and medical school, everybody is embracing Gen AI and everybody is getting that hands-on wow moment with it.

 

Rob Stevenson  3:33  

That's great. So it's not just the STEM students. It's sort of a more wider swath of people. Oh,

 

Jia Li  3:38  

that was a surprise to me, actually.

 

Rob Stevenson  3:41  

Yeah. Kids these days. And then a generative AI Am I right?  

 

Jia Li  3:44  

Kids, as well as experienced entrepreneurs and business leaders? That was really really impressive.  

 

Rob Stevenson  3:51  

Yeah, of course. Could you give us like a quick taste a quick sample, like the most recent time you you went over and, and taught a class? What were you trying to cover?

 

Jia Li  4:00  

Well, the most recent ones are obviously about the Django AI. Right. So actually, I have an upcoming one that I'm going to share what is the future of AI agents?

 

Rob Stevenson  4:12  

That's a tough quarter to look around. John. What is your perspective? Where were Where do you come down on what the future of the agent is gonna look like?  

 

Jia Li  4:19  

Yeah, I think is essentially a hope to demystify a little bit about agents. And oftentimes, we see a lot of fancy demos, right? On the other hand, the type of companies startups are practicing how to put that to the real world. I think it's always great to to learn about what is the gap and how to bridge the gap, and what is the tech boundary and how to define a product. With the understanding, there is a technical limitation. I think that's very, very important and Personally, I'm very, very excited about AI agents, essentially, it really, if we think of merging AI, the two most prominent advantages, why is how it will help to interact almost like a human. Second, when there is a massive amount of information, how to connect the dots? So agents really help to do both and really help for us to transform how we interact with products, how we interact with services, how do how we interact with human hardware devices.

 

Rob Stevenson  5:44  

So when you say agents are you speaking about a co-pilot, a chatbot, an LLM trained on a company's proprietary data, that sort of use case

 

Jia Li  5:54  

those are a subset of what agents are capable of. But as we probably have seen or heard from many technical leaders, really AI agents, one of the top applications is for customer experience is how can simulate customers how to say customers support or you know, sales agents or product education, etc. So to meet the customers where they are, right, so it's just the beginning, right? So there is a lot of potential impact down the road. Hopefully, in the ultimate days, every one of us will have a super AI agent that understands us and can help us to handle everything in everyday living in our work. And in education, pretty much everything.  

 

Rob Stevenson  6:52  

I'm glad to hear you're prioritizing this mission of demystifying AI that you're putting together talks around know you have some conference appearances coming up where you're going to be speaking on this topic. And the reason that I'm pleased about it is that, on the one hand, it would be very easy for someone with your background to decide that that was beneath you, frankly, to be like, I am not going to explain to people why this is so exciting. I'm going to work on the exciting parts. But you don't have that ego about it, which I think really speaks to your character and your background. And also why it's so important is because, like other people listening to this, I'm assuming we're all AI practitioners here, not including myself, but they all understand the opportunity. They all understand like that what is possible and why this is so exciting. And it feels like other areas of the business, maybe don't have as much they don't understand, like all the specifics, they know that there's hype around it, they think that it's just this thing that they can implement it and or solve other problems. And so there is a little bit of education and a reality check necessary. And so what I wanted to ask you, John, is when you see some of these agent use cases, right now there is this willingness to try on the part of the non technical business leaders to invest. Do you think that the tools and tech is going to be sufficiently advanced and helpful for these folks, for them to continue? Investing in?  

 

Jia Li  8:08  

It is a very good question. And really for everyone, there might be a personalized way to do it. Right. So what I observe is that first AI agents is not as magic as all the hype you put described, right? So mainly, like fancy demos, or AGI, etcetera, we're not there yet. But on the other side, there is another extreme of thoughts. Okay, Gen AI makes mistakes. That's why it's useless, I think it's neither of these, it's actually something in the middle for people who can understand the technical boundary, and the real-world challenges that it can solve. And also can define the product very well to tackle such problem, right? So that is why I'm always excited to discuss and share with people who are interested in being agents and in general AI in general, I felt down the road, or even right now, it really requires interdisciplinary collaboration that people who are domain experts understand the real world challenge and people who can understand both the technology and the real world challenge and can talk with those the tech side as well as product side as well as the domain side, right. So all these people should come together, or the thoughts should come together in order to view that most of that and just that is one aspect of this in addition, right, so there are a lot of painful lessons that I myself What our team has gone through, I hope to share this with the rest of the community so that they don't have to go through all repentant, repeated same mistakes. Right. So typically for AI agent, right, that goes back to okay, what kind of companies? Are you using? And what is your best advantage has a expert in either product in development or in business side? Right. So I think that's very important to understand. And it's very important to collaborate, not only as individuals, as well, as teams across across teams, within companies, or even partnering with the different companies, right, think over, you know, Apple, you know, Samsung, they're very advanced in their own way. I mean, in their own domain already. And they are collaborating with Google on the overlay on like, right? For myself, I felt there are big tech companies, that have a lot of resource, and they have a lot of talent. But there are so many SMBs, small, and midsize companies, how can I myself have how or how our company could help all these companies so that they can be equipped with Jiang AI, they can benefit from the AI agent? That's why I felt it is very important to really have more sharing and collaboration within the field. Yeah, particularly for AI agents for us. So one of the two biggest challenges we learned, why is the latency, and second is the accuracy, right? So especially in everyday living, right, you need to solve a lot of mission-critical tasks. If something is broken, you want to try again, to figure out how to worry, how can you fix it right? Instead of chit-chat? Okay, help you keep your companion, right. So there are a lot of opportunities in that direction, too. But I felt for the mission critical tasks, in education in health care in everyday living, there are so many problems that requires really, highly accurate approaches to handle that. So that's often associated with the algorithm advancement of the algorithm, sometimes the model size and the vast amount of data that can represent the problem that you're dealing with. Right? So it is a lot of it needs a lot of effort into really, how do you come up with the most accurate algorithm or agent to solve the problem? And that leads to the first challenge ambition, right? So oftentimes, in agent, you need to have multiple steps you need to think about, especially in the complex scenario, think about, oh, how do I plan to solve this problem. And once you have different steps in your planning, and your execute each of these steps, right, so that will have a totally create a long latency. So having an efficient algorithm, having efficient hardware, and software optimization is really, really valuable. That's why for our company LiveX AI, we actually, while we are empowering the S&P companies for their customer experience, do collaborate with advanced tech companies like Nvidia or Google so that we can leverage the best hardware of the cloud and make it scalable so that we can empower many of the businesses to who care about their customer experience.  

 

Rob Stevenson  14:22  

Yeah, it makes sense. It was interesting to hear that you prioritize latency, with so much accuracy, obviously, it needs to be right. I was surprised to hear your call latency just because it felt like a foregone conclusion like Well, yeah, of course it should be fast. We all know that. Right? But I assume there's more to more to it than that. Would you mind just extrapolating a little bit on why latency is so crucial?  

 

Jia Li  14:44  

Totally. If we think January is gradually transforming how we are interacting with product or devices and services, right. So in the past, the expectation or If you are interacting with, for example, customer support, you file a ticket, you send an email, and you're willing to wait for 24 hours to get the answer, right? And nowadays, because of the advancement of Jiang AI, there are AI agents that can hear chat see, and show all the aspects of the product for you, then the customer's expected expectation is very different, right? So when they come, obviously, they want to hear the answer. Immediately, they have seen the magic of how you are chatting with you know, Chad GBT or Gemini etc. Alright, so now the user behaviors are gradually changing so that they will expect the brands or the product to webpages or when they make a phone call, they want to have an immediate response for it in an accurate way. Right. So that is very, very different than expectation than the pre you know, Gen AI.

 

Rob Stevenson  16:11  

Yes, certainly. Isn't it interesting that we don't extend the same courtesy to these really advanced technologies that we might extend to even a faceless, anonymous human being?  

 

Jia Li  16:23  

Yeah, of course, because as human, we are so busy, and especially in customer experience, side, right. So the call center, every agent is so busy handling, so large, call volumes, and you always have the I'm not so sure if you have that experience, I had to call multiple departments and got transferred from one department to the another and have to repeat my identity, age essential multiple times, right. And every time many of these are repetitive effort on both sides, right. So it is a lot of efficiency and productivity improvements that Jiang AI can bring to even assist the customer experience, and agent.

 

Rob Stevenson  17:19  

Certainly. And I guess in the case of customer support, I shouldn't say that people extend more empathy to human beings than they do to a machine I'm sure called support, people are treated very briskly and rudely because they're, they're speaking with people who are very frustrated and who want their problem solved right away. But in the case where that was replaced by generative, I mean, shouldn't we extend that empathy isn't the thing going to learn from the way we treat it,  

 

Jia Li  17:42  

I felt a it's a transformation that will make our human much more productive and doing more complex or interesting aspects that really can leverage the human empathy aspects, right. So I felt it is more how people are leveraging Jiang AI so that they can be more productive, they can offer more better experiences than those without, and the human-in-the-loop aspect really gives a lot of value arrays. So for example, you know, instead of having a doctor, doing all the typing, I rather the doctor spend more time with the patient, listening to them understand their pain points, and really helping them to solve the problems instead of doing all these typing and repetitive but labor intensive types of  

 

Rob Stevenson  18:46  

Yeah, certainly. So it's clear why there's so much chatter about AI agents, right? Because every business needs it right. Every business has this need to interface with its customer base they need to do so at scale, the larger you are the harder that becomes more inaccurate, more unreliable from the perspective of the user. Right? And so I can see why this comes up on the show a lot the chatbots, the LLM, copilot, etc. You mentioned that those are only a couple that's a small subset of the other use cases. I'm curious if you could rattle off like as we go upward and that hierarchy past that subset of like chatbots copilots. What are some of those like I guess more executive function type of tasks? Do you anticipate that AI agents can disrupt?

 

Jia Li  19:29  

I felt really hoping the in the future the AI agent can be personalized, can understand really the pain points and would offer much more accurate and also personalized care to people? Also, right now, you probably see a lot of tools that are chat oriented, right. So though as humans, we interact with our five senses With our perception we can see right and we can talk and from the voice tone, you know, I'm happy or not. Right. Also, we can type. And there are more. There is even more advanced research in, in the research field. But I hope in the future, there could be multimodal AI agent there really that is personalized for for everyone, which is going to be much more convenient and also can help much more about the privacy security perspective about even everyday living perspective.  

 

Rob Stevenson  20:43  

So that does feel inevitable doesn't that will be multimodal that it won't merely be text base, or even audio based. So even like, you know, we have the we have the voice assistants, whose name I will not say because she's listening and I don't want her to trip up on our episode. But that is an example of a multimodal, right. And it's not merely because it is easier than typing with your thumbs. I do think it's the idea is that is more natural, and perhaps easier because it's more natural, but that that does feel inevitable, right that we will we will wish to interact with our technology the same way we interact with other members of our species and with the world around us, right as with just like our five senses or modalities. Are we talking about the metaverse here, Jia?

 

Jia Li  21:20  

I think it's definitely a critical component of it, right? So there will be the remote Metaverse to connect with each other and even have a community around it. But they're from the human nature perspective, there are a lot of things that in the real world we were we were handled to right. So even in the physical world. How nowadays there a lot of really innovative approaches are thinking about robotics and humanoid etc. So I bet multimodal is one of the most critical aspect of that, too.

 

Rob Stevenson  22:00  

Yeah, this is the fun stuff. So this isn't what I wanted to ask you about as well. Jha, as we said, the use case for all the AI agents is well established and every business needs, and that makes a ton of sense. But you also because of your background, and just us expect because of your curiosity, right? You kind of have your finger on the pulse of a lot going on in this industry. So I'm just curious what your personal opinion is, when you think of the really, really exciting areas of research and development within AI and ML. What kind what kind makes you excited what kind of Stokes that wonder in you,

 

Jia Li  22:31  

I'm still not satisfied with where we are with the multimodal AI agent, aspect. And if you look at the most advanced multimodal mottos out there, even the public benchmark, which is fairly limited, we get 60% Understanding of visual content and video on content, etc. Right. So if there is a lot of promising progress already, but I feel like we are still we still need some time to solve that problem with that is mainly from the understanding perspective, right? What about context? How do we encode the long-term memory and the short-term memory or contexts when we are thinking about a problem when we are doing planning when we are trying to do reasoning, there is just so much that we still need to do and we're far from being there yet.

 

Rob Stevenson  23:36  

This is the really, really interesting part of multimodality which is that how do you like you just said how do you encode that data? That is maybe not text and like as humans there's tons of information that we get via text right tons but it's so much of what we do and see and how we learn happens in the subconscious processes in our brain we don't fully understand it maybe are definitely aren't binary definitely aren't write a text. So do you have any notion of what that might look like if we if we were to move beyond training these tools with text and images, which images are then like encoded into text which feeds it right videos to how do we do that? How do we encode multimodal experiences without texts?

 

Jia Li  24:19  

Right now, we have already seeing a lot of the joints training joins multimodal training approaches. So far, it really shows the potential right for example, in voice speech recognition, think of in the past you have to transcribe or recognize to speech recognition and then you do language understanding, and then you synthesize the voice so that you can get the voice output right. But by understanding the voice from the audio and language jointly, one can easily predict Hey, like when this person is going to stop or pause All right. So you can actually do the prediction, instead of in the past that you You're, you're doing recognition. You do multiple steps in order to get the voice coming out on the multiple steps. Right. So, I felt the similar for visual video, voice, and language. Jointly, you're seeing a lot of new signals and new insights coming out of that. So I felt that that is that is really the holy grail that people would not only do understanding, but you can understand the why Oh, predictor at the same time, right, anticipating prediction, and understanding and reacting, responding to it. So I felt that is very promising that someday we'll get to it. Right now, there's still a lot of work to be handled. Right. So that's why I failed for AI agent. In order to reach there, I think one of the biggest, biggest challenge, besides the how, from the model algorithm, no computer perspective, is data. Right? Think of in the past, as I mentioned, the way we interact ways AI, be a discharge in the last generation, right? Or be the chat or email file a ticket, or you know, phone call, the data you have is very different. And the human behavior interacting with it is very different. And that's why you know, once you can close the loop of multimodal AI agent, the type of data you generate may be different.

 

Rob Stevenson  26:46  

Yeah, yeah. When you said it's the Holy Grail, you kind of explained the process of like, okay, you have multimodality, and you're able to take in all the information and synthesize it and make a judgement, right? That's the Holy Grail. It's not just merely how we represent tastes, for example, in data, it's like, it needs to also be fed into something that can make a judgment. Now we're approximating AGI right now we're approximating something that can take all of the inputs that humans input and then come up with, right like some kind of insight, some kind of intuition, some sort of judgment and action to take, which is exciting. I gotta say, you really just unlocked something for me job. Because I was feeling a little I'm going to confess, I was feeling a little negative about the AI agent space before this conversation. And part of it is because you know, I'm interviewing someone next week. He's a Google DeepMind. Researcher, he has published a paper about, you know, he's figured out how to magnetically control tokamak plasmas with deep reinforcement learning in order to make nuclear fusion more reliable, and easier and faster and cheaper, blah, blah, exciting. It's very exciting, right? And so I was like, Man, that's pretty awesome. He's working on using deep reinforcement learning to create nuclear fusion. But then there's all this money and all these people who are making better chatbots. And in my mind, that was like, that was like, not so exciting. But then the way you explained it, where it's like, look, the AI agents as they get better and better, what are we doing here? We're approximating being able to speak to a human and giving you a reasonable answer back like today. Chatbot. Tomorrow AGI right?

 

Jia Li  28:16  

Hopefully, probably not tomorrow. I think we will gradually experience Hey, is getting better, it is getting better. And it can solve a lot of problems and more problems. I think that's how we are going to approaching the ultimate goal. But right now, we are definitely several tomorrow many, many tomorrows together

 

Rob Stevenson  28:42  

several tomorrow, at least a week away. John, this has been really fun speaking with you, thank you for walking me through all of this and for giving me a new perspective. And I hope the folks out there listening had that experience too this was a delight to have you back on any time, I mean that you're doing amazing work. So as we creep up on optimal podcast length here, I'll just say thank you very, very much for coming here and for sharing all of your experience and thoughts with us. It's been a delight for me.  

 

Jia Li  29:08  

Thank you. My pleasure. And thank you, everybody.

 

Rob Stevenson  29:13  

How AI happens is brought to you by Sama. Sama provides accurate data for ambitious AI specializing in image video and sensor data and notation and validation for machine learning algorithms in industries such as transportation, retail, e-commerce, media Medtech robotics, and agriculture. For more information, head to Sama.com