How AI Happens

Lemurian Labs CEO Jay Dawani

Episode Summary

The future of AI depends on overcoming its biggest challenges, chief among them being hardware limitations. Joining us today is Jay Dawani, Founder and CEO of Lemurian Labs, to share how his company is addressing this central issue by building a platform designed to make AI development more efficient, affordable, and environmentally friendly.

Episode Notes

Jay breaks down the critical role of software optimizations and how they drive performance gains in AI, highlighting the importance of reducing inefficiencies in hardware. He also discusses the long-term vision for Lemurian Labs and the broader future of AI, pointing to the potential breakthroughs that could redefine industries and accelerate innovation, plus a whole lot more.

Key Points From This Episode:

Jay’s diverse professional background and his attraction to solving unsolvable problems.
How his unfinished business in robotics led him to his current work at Lemurian Labs.
What he has learned from being CEO and the biggest obstacles he has had to overcome.
Why he believes engineers with a problem-solving mindset can be effective CEOs.
Lemurian Labs: making AI computing more efficient, affordable, and environmentally friendly.
The critical role of software in increasing AI efficiency.
Some of the biggest challenges in programming GPUs.
Why better software is needed to optimize the use of hardware.
Common inefficiencies in AI development and how to solve them.
Reflections on the future of Lemurian Labs and AI more broadly.

Quotes:

“Every single problem I've tried to pick up has been one that – most people have considered as being almost impossible. There’s something appealing about that.” — Jay Dawani [0:02:58]

“No matter how good of an idea you put out into the world, most people don't have the motivation to go and solve it. You have to have an insane amount of belief and optimism that this problem is solvable, regardless of how much time it's going to take.” — Jay Dawani [0:07:14]

“If the world's just betting on one company, then the amount of compute you can have available is pretty limited. But if there's a lot of different kinds of compute that are slightly optimized with different resources, making them accessible allows us to get there faster.” — Jay Dawani [0:19:36]

“Basically what we're trying to do [at Lemurian Labs] is make it easy for programmers to get [the best] performance out of any hardware.” — Jay Dawani [0:20:57]

Links Mentioned in Today’s Episode:

Jay Dawani on LinkedIn

Lemurian Labs

How AI Happens

Sama

Episode Transcription

Jay Dawani 0:00

I'm imagining, you know, multiple special agents that are, you know, super good and knowledgeable about certain verticals. Eventually we're to a place where you don't need to be a knowledge worker. You just need to be a person that really understands the problem well enough, and you can define it and give it to them and then keep iterating.

Rob Stevenson 0:16

Welcome to how AI happens, a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson, and we're about to learn how AI happens. Okay, hello all of you, wonderful machine learning engineers, data scientists, researchers, academics, AI practitioners of every ilk and persuasion. Welcome back to the podcast. It's me, your host, Rob. I have a fantastic guest for you today, and it's already my favorite kind of guest, because we are now, like, 12 minutes past the time we were supposed to start recording, and we were just having so much fun chopping it up that I had to, like, abruptly interrupt our conversation and say, Listen, we have got to start the show. So I know that this is going to be a fun conversation, because it already has been the part that you out there in podcast and we're not privy to. But anyway, my guest today is an advisor for NASA and data speckle. He served as a director of artificial intelligence over at the geometric Energy Corporation for some street cred, way back when he was an intern for OpenAI, before it was cool, and also it won a ton of awards in math and as an Olympian swimmer representing the nation of Pakistan. I could go on and on about this guy's accolades, but let's just bring him in. Jay Dawani, welcome to the podcast. How are you today?

Jay Dawani 1:46

Doing well. Thank you for having me.

Rob Stevenson 1:47

I guess I should have mentioned in that hackneyed stumble through your LinkedIn profile that you are also the co founder and CEO of Lemurian labs. So that's where you are now. But yeah, I'm so glad to have you. And you were calling in from a recording studio that has housed such storied names as Prince as David Bowie, others, and so there is some sort of like, soulful, artistic, arcane energy imbued all around Jim, hoping that's gonna translate to a good episode here today.

Jay Dawani 2:14

Let's hope so. I mean, I'll do my best to not break out into song. That's the best I can do.

Rob Stevenson 2:19

If you ever feel compelled to break out into Sanjay. I would simply love it. That would be the first for this podcast, anyway, on the sequel, because you're also a musician, I think right, in addition to all those things I listed,

Jay Dawani 2:30

yeah, I did play in a band for a little while back in high school, okay, but you're being

Rob Stevenson 2:33

humble. It was like a garage band. But you still get to tinker a little bit. You still get to shred in your home for your own

Jay Dawani 2:38

meditation. I feel compelled to break out my last Paul and start shredding. Yeah,

Rob Stevenson 2:43

I love it. You're an interesting guy. Jay. You have a ton of different interests. Here. You Contain Multitudes, it seems. So I'm just curious. What do you think it is about you that you know leads you to have all these different interests and be working in AI, working in robotics, founding AI companies. Is there a common thread in all of the things that you've participated in?

Jay Dawani 3:00

I am very, very curious, restless, and I like hard problems. I think that's what it comes down to, every single at least, if you look at the last 810, years, every single problem I've tried to pick up has been one that probably most people have considered as being almost impossible. And there's just something appealing about that. There's a lot of folks working on the easy problems. There's very few people working on the harder problems. And it just feels like there's more opportunity there. And there's one thing that Astro Teller always used to say, and it's it's much easier to get 10x than 10% because most of the time people are trying to squeeze out 10% is mostly like squeezing blood out of stone. But when you're trying to get 10x it usually forces you to think in a way that nobody has before, for the most part, and that's where the true gains are.

Rob Stevenson 3:43

What are some of those unsolvable problems that you were attracted to? Say, self

Jay Dawani 3:48

driving for one general purpose autonomy that's somewhat related. Was looking at how to use AI for multiphysics simulation, so you can create different kinds of engines and different kinds of materials faster. Worked on creating a brain computer interface that could control a drone swarm at one point. Now trying to solve, well, create a unified software stack for heterogeneous computing at scale so more hardware is more accessible and cheaper, so that we can continue progress in AI and finally realize it in a more meaningful way. Do

Rob Stevenson 4:18

you ever miss robotics? Seems like that was a little bit of an origin for

Jay Dawani 4:21

you. Yeah. I mean, sure, I do. I think there's unfinished business there. I'm

Rob Stevenson 4:27

waiting for the sequel, jadewani Robotics part two. This time it's war.

Jay Dawani 4:31

Well, weirdly enough, this robotics had got me to what I'm currently working on. We started more or less in robotics. So, I mean, if you want, it can take you through the history of how the meridian actually happened.

Rob Stevenson 4:31

Sure, I would love to know how robotics led to this company.

Jay Dawani 4:42

So January 2018 I'd gotten convinced that we have all the pieces necessary to make general purpose autonomy with a foundation model for autonomy. And I'd started having, you know, ideas about what an architect. Could look like for an agent that could navigate the world, learn in it, and could then be fine tuned and generalized for different kinds of tasks, assuming you have a little bit of data to help address that. And it was fundamentally a world model which was part generative. It had Transformers, it had convolutions. It had things I'd borrowed from a clockwork RNN, fast weights and reinforcement learning. And I was trying to, like, unroll this over time and then resample it, so you can actually arrange tasks based on orderings and how you sample the world, and based on the difference between expectation and reality at each sequence or each state action pair, you back propagate the difference so that your model is learning as it's going and that we're collecting data. And then there was a separate idea for building a simulation engine where it could train that and then address this into real gap a bit better for different kinds of environments. I had a cluster of 512, GPUs, v1 100 at this time, because that's what we had, and I exhausted that in a couple of weeks. So it was like a 2 billion parameter network back then, and my estimate was, I need to make it at least 100 times bigger if I want to fully generalize it. I think I underestimated the size and the compute required. But my estimate back then was I needed about 400,000 GPUs, which is a lot. No one gave it to me. I tried really, really hard to get it. I'm like, Well, assuming, you know, all the companies were working on building accelerators and GPUs and everything today kind of hold their promise of continued gains. Feels like training may not be a bottleneck for a whole lot longer, but the bottleneck will come down to, how do I run this in the wild, right? Because ultimately, the value is not just in keeping a training and in a lab. It has to be deploying the real world to create value. So start thinking about what that needs to look like, and there was no chip to run it on, and you need something that's going to be doing roughly around six petaflops in 70 watts, running multiple channels at 60 frames a second, or 10 millisecond latencies or answers that just hadn't been built. And no one was thinking about it. So I started thinking about, how do you design chips to do that, and then basically have the role model, the chip for it, and make that available to people within the whole environment to go train your own model and to put on that job.

Rob Stevenson 7:09

That's why you say there was unfinished business, because you reached this point in your development where you're like, oh, this thing doesn't exist. So I was founding a company Easier than You know, I don't know. The alternative, waiting for someone else to do it or building it on the side in service of this project.

Jay Dawani 7:24

I have found that no matter how good of an idea you put out into the world, most people don't actually have the motivation to go and solve it. You actually have to have an insane amount of belief and optimism that this problem is solvable, regardless of how much time it's going to take. But it's worth doing, and there's very few people that kind of do that, and that's why I think I love talking to other founders, because they're this special breed of human that is insanely optimistic, like everything is solvable. All you need is money in tech. But for the most part, yeah, I still believe in that future. It's just there's a lot of problems that have to get solved along the way in order for us to actually get there. And obviously, once we were starting to design the chip, you know, chat GPT happened. The conversation changed. Edge. Nobody cared about data center wasn't mattered. We are believers in general purpose acceleration. So we went towards building a data center accelerator for training and inference. Started looking at that and looking at all the problems involved in releasing a chip out there were all software related. So we said, You know what, a startup can only do one thing well at a time, so let's go solve the software problem. But yeah, I mean, there is unfinished business for sure, and my hope is someday I get to get there and actually see the whole thing through. But for today, you know our focus is offer

Rob Stevenson 7:24

you will solve that problem, and then you will launch a new arm of the Lemurian labs business, perhaps to solve that problem.

Jay Dawani 7:39

Well, the name Labs is there for a reason, right? We constantly solve our problems. You can do anything,

Rob Stevenson 8:53

yeah, you know when you shared that notion on the founder as someone who was insanely optimistic about solving a really difficult problem, you know, that definitely shines through, which is what you've shared with me so far about yourself. But when I hear that, I'm like, why did this guy ever have a full time job where he wasn't a founder? Like, can you share a little about your just, like, career journey? Like, why would you if you have that curiosity and that need to work on the problems you think are cool? How does that translate to, you know, having a boss and working in an existing company?

Jay Dawani 9:22

Yeah, it's a weird thing. I always ever saw myself as an engineer, really. I never thought of myself as somebody who's a business person or somebody who should be starting a company, because there's better people who are more experienced than me and probably better at doing that. So what gives me the right to go do that? Right? That's every single person at some point is like, it'd be great to have your own company. Feels like it's a lot of independence and you get to do whatever you want, and no one tells you what time you need to wake up and show up at the office. But it's not entirely true. You're still accountable to a lot of people, right? I don't work for myself. I work for my people, and I work for all the shareholders and all the investors and ultimately, the customers. So you. Report to a lot of people, it's hard, and that's why I think most people don't do it. But for me, it was more of a question of, Am I a CEO, or am I an engineer, and can those be two different people, or can they be the same? I think it took me a long time to realize that, you know what, when you're a founder, you don't have to be the greatest business person in the world. And the amount of things that most people would expect you to be great at, you don't have to. You just have to be continued really good at, or continue to be really good at the things you're already good at, because your value comes from being exceptional on the things you're already good at, because that's how you come up with the problems and how you solve them. The amount of time it would take me to become as good in another field, probably wouldn't result in the same kind of end result.

Rob Stevenson 10:48

Why, then, is CEO the best use of your abilities?

Jay Dawani 10:53

So I think if I look at people who've done it well, and I take the example of Steve Jobs, Jensen, Elon and Eric Schmidt, many others. They're all engineers for the most part, right? They think, and engineering isn't the degree you have, it's the mindset you have. If you're a good problem solver, you know how to get the truth, and you know how to actually do the right constraint solving and get to a solution. You're fundamentally an engineer. They were able to do it, and part of it is you have to be deeply technical enough to run a technical company. Otherwise you don't know how to actually think through the strategy and the product and talk to customers and ask right questions and at the early days and maybe even depending on the company longer term, the founder and CEO ultimately has to also be the chief product officer. It doesn't work. Otherwise, it has to be one consistent vision for one kind of company solving one problem for one kind of set of customers. So customers. I met a lot of folk who kind of helped me have the confidence that I was capable of doing it. So I decided to give it a shot,

Rob Stevenson 11:50

and here we are having given it a shot, continuing to give it shots. You know, not to be a naughty interviewer and go back a minute here, but I'm going to you mentioned a minute ago that, like, okay, no one cares about edge computing. I just was hoping you could kind of finish that thought or expound on that thought. Like, is that true? Is it all cloud these days? Like, is no one ever going to care about running a tiny algorithm on their toaster?

Jay Dawani 12:11

Well, there's already tiny algorithms running on toasters, right? But in terms of AI, I think we will see just about every single machine out there that's embedded in our lives in some way, have some form of intelligence, like industrial IoT. And IoT have been part of the conversation for lower a decade or two, right? So it's been there. I think there's a lot of problems to solve before we can actually realize it in the way we thought about it. And to some cases, I think they may not even have as much value as people thought, which is why they never took off. In terms of the edge, I think there's an immense amount of opportunity. I think we want to try and bring more capabilities closer to people that are going to benefit from them, because there's given the models are so large, right? And if you look at the traffic to and from a data center, the amount of latency that you actually incur doing that, some people aren't going to tolerate that, especially with certain kinds of applications, right? Of applications, right? You think about factory robots, you think autonomous cars, you think about drones and all these things. They operate in a lot of denied environments. And you know, the switching between different kind of what's the word I'm looking for zones for coverage, right? You actually have to switch your IP addresses and things a lot of the time, and that can cause data loss in some cases, so you want to push it out to the edge, but there's others where you can talk or not to so I think you'll see a confluence of both. But I think the numbers that I have seen, and I haven't dug into them personally, so I can't verify it, but says that the edge is somewhere between order and order and half magnitude bigger than the data center, which is true, means there's a lot of opportunities, but it takes a long time for things to trickle down from the data center to the edge normally. And that's usually how things flow

Rob Stevenson 13:50

that you think is the use cases for devices where any amount of latency is not acceptable,

Jay Dawani 13:56

yeah, well, not any. There's always latency, right? It's, I think there's a minimum amount of latency that you need tolerate. Anything beyond that, you won't that has to go to the edge. Anything that is acceptable will stay in the data center.

Rob Stevenson 14:07

I see, yeah, that's much more efficient. And you can, okay, we can send out some processes, especially with the limited hardware and in IoT, right?

Jay Dawani 14:15

And there's also the issue of models, right? If you look at from a compute perspective, and you look at model size and how they like to execute, there's a certain amount of performance that is necessary, and that doesn't fit within edge cases today or edge use cases today. So you need to have innovations in model architectures, or pick entirely new kinds of things, or to sell them down to smaller things, and be very aggressive in pruning and quantization and sparsification, and leverage that if you want to get those things to run, if you can't, then data center it is,

Rob Stevenson 14:43

Yeah, makes sense. Okay. Jay enough, screwing around here. We've got to start the show. So we heard a little bit from you about the challenges you were facing that led you to want to found a company focused on making compute more efficient. So here we are the marine labs working on that. I'm just curious if you could share how. You are attacking this problem, and what are some of the levers that can be pulled to make compute more efficient, more affordable and, I guess, environmentally friendly? Let's lump that one in there too.

Jay Dawani 15:09

Yeah. I mean, ultimately, things come down to energy efficiency, right? So how much energy does it cost to move something? How much energy does it cost to manipulate something? And movement storage, manipulation and movement are really the three things that create a computer, right? So you have ALUs, that's your transformation, you have your memory, that's your storage, and then you have your interconnects and so on that do the transferring. Each of them have a different cost, and you want to try and minimize the distance that anything has to travel, right? Because that's the most costly Part Math at this point is very cheap, to the point where I don't even consider as part of the calculus, almost calculus, almost. And with number systems getting smaller and smaller, that's going to continue to be the case, or you dominated by movement, so you want to push closer and closer. But the challenge becomes, workloads are changing. They have interesting execution patterns. You have different instructions operating so you're out of order machines and so on so forth. And then you know, CPUs, you have GPUs, you have NP use, and other kinds of things all talking together now, and each one is optimized for a slightly different case, right? So, one cares a lot about throughput, one cares a lot about latency, one cares a lot about something else, or somewhere in between, or like, if you're super heavy on convolutions, right? You're gonna have a convolution accelerator on the side. But that creates a lot of problems in terms of how well can I utilize these resources, right? I want to place the right task and the right data on the right machine to get the most out of it. And ultimately, as a software engineer, or if I care a lot about performance, my goal is to saturate those resources, right? That's where I get the most. And there's times where I don't care about energy as much, because I know I'll make that up if I get enough performance like today. If you look at GPUs, I've seen cases where people are paying 90% of the energy budget of the GPU, but they're only using 12% of the resources. That should not be the case. People should be using like at least 60 70% of that and paying like 80%

Rob Stevenson 16:58

what's responsible for that parity bad code and oops, to

Jay Dawani 17:03

get performance out of these kinds of accelerators, you actually have to understand the hardware. You have to understand the workload. You have to understand how kernels map down to hardware. You have to think about the operations themselves, and how do you fuse them together? A lot of the time, the libraries that people write and evoke for when they're doing things have these artificial memory barriers between different kinds of operations which force you to write back to memory when you shouldn't have to. And those cause penalties poor placement of work can, you know, lead to you doing, having cache message in the memory hierarchy and having to go to DRAM there's a lot of little things in there that people don't think about, right? And then collisions between work, and there's a lot, right? And GPUs are complicated machines. They're not easy to program, but if you're good at it, you can get a lot of performance out.

Rob Stevenson 17:48

So this is why you believe that software is the opportunity for really increasing AI processes, as opposed to improved hardware.

Jay Dawani 17:56

Yeah. I mean, it's not so much. Even the fact that GPS aren't a program, right? And they're super powerful, it can give you all the performance. Or that they're underutilized. Those are all true. But if you look at AI today, the size of the workloads, the volumes of which they're operating, right? So if I assume people are operating at like a GPT five ish scale, whatever that is, and then I look at the vertical industries and the smaller companies, and they all have somewhere between seven to 77 billion parameters. And then the 135 to, like 500 billion parameters. And then trillion plus, right? If I look at how all of those execute in like a single day, when I look at wide adoption for, let's say, a billion people using on a daily basis in 10 years from now? Do you have an estimate of how much compute that requires?

Rob Stevenson 18:45

I wouldn't even begin to guess. So I'm hoping you're going to tell me.

Jay Dawani 18:48

So it's somewhere between 10 to the 30 and 10 to the 31 flops a day, right? And that's just for serving the model, right? I'm not even looking at training in anything else right now, and there's a lot of other operations have to happen around AI to make it work, but that's purely just deep learning flops. That is roughly around one to 10 geoflops, right? That is more compute than we have actually seen in our entire lifetimes in aggregate, right? But it's now compressing to a single day. To actually run that mode of compute, you need to perfectly saturate about a trillion h1, 100. Oh, no, r1 100. The Rubens said, come afterwards, after Blackwell. So that's a lot of GPUs. Like the fab from TSMC can only really produce about, I think, 30,000 wafers a month, and it's assuming 50% yield. So your overall chips in a year end up being close to 40 million. But not all those 40 million are going to one count. To one company, right? It's going to distribution multiple companies that have allocation. So if the world's just betting on one company, then you know the amount of compute that you can actually have available is pretty limited. But if there's a lot of different kinds of compute that are slightly optimized for different resources, making them accessible allows us to get. There faster.

Rob Stevenson 20:01

Okay, interesting. So it's not merely about, oh, we are better at developing hardware than software. It is that the need for hardware is so great that we don't have a choice but to make software more efficient. If we ever hope to run some of these processes, it cannot be a hardware problem. Is that what you're saying exactly,

Jay Dawani 20:19

right? So heterogeneous computing is a real thing, right? If you ever go into supercomputer world, it is CPUs and GPUs and really fancy interconnects and systems kind of built out that you have to program right when, every time the configuration of hardware changes, your software has to change right and rewriting code and finding people that know those machines well can do performance work are very, very expensive and rare. There's less than 4000 people in the world that can actually do really good Kudo work. There's a fraction of that for any other hardware, right? And most of those people work at the companies designing the hardware, so it's harder for developers to use. So you need to rethink software from the point of view of heterogeneity and scale to run massive workloads everywhere and get performance out of them without having to get to the low level parts of hardware, right? So that's basically what we're trying to do. Let's make it easy for programmers to get performance out on any hardware.

Rob Stevenson 21:13

As a quick aside, if you wanted to train to be one of those 4000 if you wanted to be able to write that kind of code, how would you prepare yourself for that sort of rule?

Jay Dawani 21:20

Oh, boy. Understand workloads, understand the core operators behind those workloads. And there's people that have actually specialized or spent decades of their life just learning how to write super fast matmuls to beat all matmuls on any kind of hardware. And at that point, you have to understand parallelism, you have to understand concurrency, you have to understand the hardware. You have to understand threads and thread affinity, and how do you actually leverage the memory hierarchy and where things sit, and then what number systems to leverage, right? So most people don't think about mixed precision in that form, like they think about from memory savings. But if you want to actually get performance out, you have to kind of like cast things into multiple different formats and then schedule them really, really well, and that's where more performance comes from. It's not even the implementation of the kernel. It's how it's scheduled and then what it's dependent on, so the tail or the head of that operation and what ties well to it, they'll prevent you from writing to memory and getting performance out and saturating those or at least coming close to saturating those alias so there's a lot like you really have to understand hardware and how compilers optimize and so on so forth.

Rob Stevenson 22:26

You know, if this podcasting thing doesn't work out, it feels like it could be a lucrative opportunity for me, although, you know, I can print hello world. You know, I'm dangerous. No, that's probably not in my future, but for someone within their reach of our voices, you know, this is if you're looking for a career out that feels like something highly in demand.

Jay Dawani 22:41

It is, I mean, I know performance engineers right now that you know they're making comfortably seven figures.

Rob Stevenson 22:48

Yeah, there's a few of them. Yeah. Now that makes sense. Thanks for that quick aside. So you mentioned a moment ago, the responsible for inefficiency is merely bad code. That's, you know, one case. And I'm curious if you could just kind of rattle off some of the other common inefficiencies in development here and then maybe prescribe some solutions. I would love it, because, in the interest of maybe identifying a problem that people listening could see it happening at their own companies.

Jay Dawani 23:14

Yeah. I mean, there's a lot of ways to think about all of this, right? So one is choice of hardware. Pick, obviously, the latest hardware. If you can write your workload, then that can map well to that hardware, you have a pretty good chance of getting performance out right, because ultimately, again, it's about saturation model architectures as well are evolving right? And part of the reason certain architectures take off is because of how well they map to hardware, right? So if you look at Transformers, everyone's talking about transformers and how great they are and the fundamentally general purpose, as long as all that data can be tokenized, it can process them, and it can produce more tokens, and so on so forth. But what made transformers so powerful was the fact that it was about two orders of magnitude more efficient than RNNs, because RNNs were not good at mapping to the kind of architectures we had, so we'd have to scale them up much bigger, and then you have different kinds of problems and trying to scale them up. So you need something that, or you need inductive bias that is both powerful, generalizable and scalable, that creates performance. And then the implementation of that, and a good example of this would be flash attention. Like Flash attention is more of an implementation or optimization of an attention mechanism around a kind of hardware to make sure it's leveraging the memory hierarchy for that hardware, like Flash tension two for the 100 flash three for the h1 100 and so on so forth. So those areas of performance, then you have your quantization, you have your sparsity, you have your pruning, you have number systems about quantization and doing that well, and then trying to optimize specific parts of the kernels. That's where performance comes from, in terms of levers that we have to pull better number systems. That was one thing that we actually explored at the moment as well, is, how do you create better number formats as an alternative floating point? It's doable, but it's really hard to get adoption for new number systems, and that's a much longer term effort. In the meantime, we're using. That more from a compression point of view, because if you can store the same representation in fewer bits, you save on bandwidth, and now if you save on bandwidth, you can get more work into the system. Now you can leverage a better scheduler, and that helps you get more performance out. There's better interconnects. If we can solve the traffic problem between chips and going off chip, that helps a lot as well, because otherwise, systems are starved. Ultimately, mostly comes down to memory and actual scheduling and the granularity of tasks that you're working with, because realistically, it is impossible to saturate an ALU because it runs so fast, the limitation is always movement of data, so mapping your workload to what the hardware is capable of, thinking about how to actually schedule it. Well, that's where you'll get performance

Rob Stevenson 25:48

to move upstream a little bit from the specific inefficiencies. How would you even diagnose them at your organization?

Jay Dawani 25:56

Well, there's a lot of benchmarking tools that people use, and this is where you hire people who you pay a lot of money to to their job isn't to figure out the problem. It is to know where to look. And that's what experience counts for. There's general heuristics. I mean, if you're doing a benchmarking and you're like, oh, this thing is only taking so many cycles, and I'm expecting it to take this much something's wrong in the implementation. Maybe it is somebody just created called, like, a dozen different libraries or frameworks and things like that, and each one has its own version of how they thought it was going to work, or they thought about memory layout in a certain way, and that worked for a certain architecture which they were building on initially, but that doesn't work for the current one that's going to cost, performance loss, and you have to go and dig into that and figure out, okay, maybe I just need to reimplement this, or I need to cast this or I need to cast this in a different way. Yeah. Ultimately, if you're going through this problem, find somebody on chance performance in their idol.

Rob Stevenson 26:48

Yeah, makes sense. Jay, I want to end with a teaser trailer for the sequel, right? You said you had an unfinished business. Let's fast forward. Let's assume Lemurian has solved this problem, or at least made sufficient progress so that you may return to the inciting incident here. What is next for you when you return to that problem? How are you going to attack it? I

Jay Dawani 27:10

don't know that I have an answer to that, really, because I think the problems that we have at Lou Marina are going to take a long ass time, and there's more than enough problems in the world, and there's a lot of smart people, so I'm hoping I can create the tools and capabilities for other smart people to go and pick up and solve the problems where I can. That's my fundamental goal. It's to create massive unlock, right? And that's partly also why I believe in AI. It's not so much that AI is going to solve all of our problems. It's going to help us solve them a lot faster and more efficiently, because fundamentally, it's a cognitive revolution, right? Part of the reason I'm excited about it is because of the future of agents, right? I'm imagining, you know, multiple special agents that are, you know, super good and knowledgeable about certain verticals. And if you can call on them, they can work together on a problem, and they can simulate multiple possible scenarios. And come back to you, after pruning down that entire search space. I'm like, Okay, now this is what you need to focus on. The amount of time you just saved is ridiculous, right? You still need people who are knowledgeable in that world, but that's a huge unlock. And eventually we have to a place where you don't need to be a knowledge worker. You just need to be a person that really understands the problem well enough and you can define it and give it to them and then keep iterating. So that's ultimately the emergency be the infrastructure to allow all of that to happen, right? And if you do that, I think there's a lot of smart people with a lot of great ideas that will be able to solve those hard problems, right? And that way I don't have to, and I can go sit on the beach somewhere.

Rob Stevenson 28:35

You can pull the Les Paul back out. Yeah, my hair's getting gray enough already. Yeah. Jay man, this has been fascinating. Speaking to you. We are creeping up on optimal podcast length here, so here, at the end of the show, I'll just say, man, thank you for coming on and sharing your background. You're an interesting guy working on interesting problems. So best of luck to you at the murine and I mean this, I would have you back on any time to do a part two.

Jay Dawani 28:53

Absolutely. Thank you so much. Looking forward to the next one.

Rob Stevenson 28:57

How AI happens is brought to you by sama sama agile, data labeling and model evaluation solutions help enterprise companies maximize the return on investment for generative AI LLM and computer vision models across retail, finance, automotive and Many other industries. For more information, head to sama.com you