Jason & Duncan discuss the results of Sama's ML Pulse Report, focusing on how to measure model effectiveness, how to increase confidence when moving models into production, and whether they believe generative AI is worth the hype.
Joining us today are our panelists, Duncan Curtis, SVP of AI products and technology at Sama, and Jason Corso, a professor of robotics, electrical engineering, and computer science at the University of Michigan. Jason is also the chief science officer at Voxel51, an AI software company specializing in developer tools for machine learning. We use today’s conversation to discuss the findings of the latest Machine Learning (ML) Pulse report, published each year by our friends at Sama. This year’s report focused on the role of generative AI by surveying thousands of practitioners in this space. Its findings include feedback on how respondents are measuring their model’s effectiveness, how confident they feel that their models will survive production, and whether they believe generative AI is worth the hype. Tuning in you’ll hear our panelists’ thoughts on key questions in the report and its findings, along with their suggested solutions for some of the biggest challenges faced by professionals in the AI space today. We also get into a bunch of fascinating topics like the opportunities presented by synthetic data, the latent space in language processing approaches, the iterative nature of model development, and much more. Be sure to tune in for all the latest insights on the ML Pulse Report!
Key Points From This Episode:
Quotes:
“It's really hard to know how well your model is going to do.” — Jason Corso [0:27:10]
“With debugging and detecting errors in your data, I would definitely say look at some of the tooling that can enable you to move more quickly and understand your data better.” — Duncan Curtis [0:33:55]
“Work with experts – there's no replacement for good experience when it comes to actually boxing in a problem, especially in AI.” — Jason Corso [0:35:37]
“It's not just about how your model performs. It's how your model performs when it's interacting with the end user.” — Duncan Curtis [0:41:11]
“Remember, what we do in this field, and in all fields really, is by humans, for humans, and with humans. And I think if you miss that idea [then] you will not achieve – either your own potential, the group you're working with, or the tool.” — Jason Corso [0:48:20]
Links Mentioned in Today’s Episode:
Duncan Curtis on LinkedIn
Jason Corso
Rob Stevenson 0:00
Hey, hello, hello everyone and welcome to a very special live edition of the how AI happens podcast. We are here on zoom on video live, which begs the question, is it really even a podcast at all? Is it a webinar or webcasts, perhaps a pot in our I don't know, whatever the parlance is, I am your host, Rob Stevenson, and we're going to have some fun here today, we have assembled a panel of experts with just a ton of experience in our space and the technical know how that comes with it. And we are going to be chatting with them all about the ML pulse report. What is the ML pulse report? Great question, imaginary pod and our listener I will tell you, the machine learning pulse report is published each year by my friends sama. And it is the result of serving 1000s of practitioners in our space. This year. The report focuses on the role of generative in those professionals work, how they are measuring their model effectiveness, how they expect it to impact their work in computer vision. So today, we're gonna go through some of the findings of the report. We're going to have our panel react, offer some solutions to the common problems folks are finding. And then at the end, we will of course, take your burning questions. So at any point during the proceedings, go ahead and use that q&a button at the bottom. Wherever you keep your toolbar really put your questions in there, we will make sure to save some time at the end to have some back and forth with our panelists. So we know what the ML pulse report is. What the heck is how AI happens? Another great question. How AI happens is a podcast featuring experts in AI ml and data science discussing the challenges and techniques they're using to bring exciting new AI tech into the world. So every week I host VPs of AI directors of data science professors, founders of exciting new AI companies, and we get technical about what they're working on. So there are plenty of podcasts out there that are for the AI consumer, you know, where they will do things like pause to define LIDAR. We don't do that here. We expect a base level of literacy on the part of listener. So that's a little bit of the show. It is available anywhere find podcasts are streamed. So enough of the shameless plug for the podcast. Let's get to the goods here and meet our panel. First up, is a professor of robotics and electrical engineering and computer science over at the University of Michigan Go Blue. He is also the chief science officer at voxel 51. Jason Corso. Welcome aboard.
Jason Corso 2:22
Thanks, Rob. It's a pleasure to be here.
Rob Stevenson 2:23
Really glad to have you. Did I do your curriculum vitae? Justice there, is there anything else you'd like to add? So we can get to know you a little bit? Certainly,
Jason Corso 2:31
yeah. So I guess one bit of background would be that I'm kind of here by accident. As an undergrad, I had a few different types of jobs, database programmer, web designer, researcher, and so on. And just kind of fell into the idea of academic research. Because I like to build things. And ultimately, after 1015 years, and that, realize, oh, I can start a company. And that's voxel 51. And it's been a it's been a good journey so far. So glad to be here to talk about that journey. And then the Gen AI revolution.
Rob Stevenson 3:03
Was there a moment like you have this very technical background, but it predates AI in the current way? We, I'm sure. Think about it. Was there a moment for you when this tech kind of technology kind of turned your head and you're like, This is where I want to apply my expertise.
Jason Corso 3:17
Oh, that's a good point. I've been working with computers since I was like six years old. I like the moment that I could make a little sprite bounce across my Commodore 64 screen, I was hooked forever, no matter whether it was like hard coded or, or AI. You know, like, I just love the interaction between physical systems, computer computational systems and humans alike. I think it's a great intersection.
Rob Stevenson 3:42
I've also been playing with computers since I was six. But I was not printing Hello world. I think I was playing Math Blaster on Microsoft 95 or whatever it was windows 95. So kind of different backgrounds. But here we are talking about all this great stuff in AI and computer vision. Jason, thanks so much for being here. Also, waiting in the wings is our other panelists who is the SVP of AI products and technology over at sama Duncan Curtis Duncan, welcome to you as well. How are you today?
Duncan Curtis 4:06
Good. Thanks, Robin yourself.
Rob Stevenson 4:08
Really well, thanks for asking. I'm doing great and excited to be doing this live podcast. And I would just love to know a little bit more about you. I kind of just told about your current role. But I would love some background so the folks out there can get to know you as well. Sure
Duncan Curtis 4:20
engineer by training. I was most recently head of product at Zooks at autonomous vehicle startup was acquired by Amazon. Before that worked at Google. I love my technology and loved being there having a huge impact before that the rest of my career was based on video games, so I'm partially to blame if you are a loved one have lost time swiping fruit and Fruit Ninja.
Rob Stevenson 4:41
Oh, no kidding. Just 1000s and 1000s of hours poured into into Fruit Ninja slicing of watermelon and whatnot. I have you to thank for that we can talk later about all the years you stole from me. But in any case, here we are just talking about generative and that is kind of what the ML pulse report was focused on the role of generative in the workplace how people anticipate Heat it impacting them. So let's pull up our presentation here just so we can have something to look at besides my mug and background, before we get into some of the takeaways, maybe it would be worthwhile to sort of limit the conversation. Generative, is everywhere. The hype is real. But it can refer to lots of different things. So maybe we can point in one direction. Duncan, would you mind outlining for us, when we just throw the word generative out? For our purposes? What are we talking about here today?
Duncan Curtis 5:27
Absolutely. So obviously, there are a lot of different technologies, similar technologies close to each other technologies. And so what we've done for ease of communication is that we typically group whether you're talking about like MLMs, generative CV, whether you're talking about foundation models, we kind of lumped them into generative AI just so that we can have a more easy flowing conversation, obviously, I think, Jason, I will call out specific subsections as it makes sense, relative to it, but that's generally where we go with it.
Rob Stevenson 5:54
Gotcha. Okay, that's helpful. So this first slide and this first response, I guess, shouldn't surprise anyone. 74% of respondents say, Gen AI is worth the hype. Yeah, it's everywhere. Again, I guess maybe could have expected that. I guess first, maybe, Jason, where would you answer this question? Would you agree with the 74%? Or where do you come down? Yeah,
Jason Corso 6:13
I mean, it's hard to answer with one clear answer. So I mean, I think the the responses must have felt the pain that I feel. Because I think on the one hand, sure, right. I mean, we're seeing capabilities that are, like, easily available through us through simple API's, accessible to startup teams, to enterprise teams, to students to hobbyists, like, I'm sure there will be new advances, even like, even if we just just pause quickly thinking like, code copilots, that alone, great, like fabulous advances. However, I think there's there's also a like, good enough hesitation to wonder like, what is the limitation of Gen AI, right? Like, you know, in computer graphics, there was so much hype 2030 years ago about like, being able to generate like, really photorealistic images, video, video frames, and so on. And then we learned about this notion of an uncanny valley, right, like, you know, we can get so good, just so good that we know that there's still a cartoon or computer generated, and then like to get to like actual photo realism. We don't really know how to cross that chasm. Right? So we kind of like special effects movies that gets hidden quickly, like fast moving sequences, and so on. So you can't actually focus on it. I don't think we have a good sense for Gen AI, like where that boundary is. And I don't I don't know when we'll find that out. And in fact, I think I don't know what you think don't give it that's like, that's, like my biggest hesitation about it. No,
Duncan Curtis 7:40
I completely agree. I would agree with you, especially on the like, is it worth the hype? Yeah, I mean, it's made, there's some definitely some transformative technologies that have come out of it that are already here. And they're now part of our ecosystem. And I think one of the you touched on that briefly, Jason is I think one of the biggest advantage, like advances that we had with Gen AI is the accessibility. And it's something where we haven't had that level of accessibility for new technologies, certainly ones that are highly advanced in a very long time. I remember my head of user research, CODEL, her grandmother asked her about generative AI and started using it. And that's when she was like, Oh, wow, that's so accessible, that my grandma who has trouble with social media, and how do i Where do I click the buttons was like, deep into conversations with chat GPT and learning about it in those ways. But so I think that hype has been really valued and really valuable. The question is the hype about what else we can apply it to? And in what ways and when is it going to have real business value and business impact? That's like days, and I think that there's some some pretty big questions there. I would I would also ask like, well, one of the ways I think about it is how much of what we're gonna see out of gin AI? Are we already seeing? And so how much further do we have to push it? And I think, obviously, adopting it to new use cases is great, more distribution of of an existing technology, as opposed to like, Well, what else are we going to be doing? Especially as Jason pointed out, that like, uncanny valley element, we've also got the fact that you, what's the business impact of a hallucination? Maybe it's fine. If you're like, just asking Chet GPT about you know, how to make a recipe and the recipe goes wrong, not really a massive impact for you. But if you're relying on it as a core differentiator for your products, and your business, wants that user impact of those kinds of elements.
Rob Stevenson 9:32
It is curious that no one outright in the survey said no, no one was like no, it's all a bunch of hooey. Which I guess speaks to the overwhelming you know, the majority here on board with it. But for the folks who are like in the not sure camp, I want to hear from y'all like put yourself in their shoes. What sort of hesitation Do you think one could have if they're like, I'm not totally sold yet? Yeah,
Jason Corso 9:55
I mean, if I used to think about that angle, I'd probably pin it on the like general versus specific, versus sort of personal access, right? Like, I mean, I could use GPT or some you know, Bard or whatever to go interact with it. And ask me questions about research or general topics like my, you know, my kids are in high school history or whatever, right? Like, we can go and talk about certain things like that. However, it can't help me figure out the best way to approach a, I don't know, like, I have a plumbing leak, or whatever I think like, like, that's not specific enough to my to my own house or my own need, or something like that. Maybe there's a little bit of hesitation around, like, what about me? And how can it be personalized for my own needs? Perhaps that's one angle that comes to mind.
Duncan Curtis 10:40
Yeah, I would build upon that and even take the business angle and say like, there are use cases that people have developing traditional AI for that, it doesn't make sense that generative AI isn't necessarily useful for that use case. You know, if I think about, oh, great, your general AI could create interesting text and do research or it could create interesting images, that doesn't drive a car. And so if you look at those, those kinds of use cases that we see within AI, it's not necessarily a good fit for those cases. And there's also I would argue, when you talk about the hype level is that the pressure to have generative AI, everything is huge, is absolutely phenomenal, because everyone understands it is the hotness right now, which is, it's great when the use cases makes sense for it, and that you can drive real business value. But then you also have to have the other side of it is with generative AI, unless you own your instance, or you've actually doing like a fine tuned model, which means that you're you've got to create a fair swath of data in order to do it. Is it any different than what your competitors can do, and you're in things like no code, or coding assistance, like goodbye, those can be really useful just to get that acceleration of development speed. But it's not necessarily a massive differentiator for you. You
Jason Corso 11:56
actually on that point, don't get so I actually had in my notes, I definitely wanted to talk about like cloud versus local it because I think this is related to the, the potential value and like, it's the same thing in some sense, as like general versus Pacific. In order to get specific either you have to like, I guess you can deploy that in the cloud now with with open AI is new, like custom model is model customization. But so just a concrete example, I work on a project that's funded by DARPA at the university. And it's all about building AI assistants that can kind of see the world that you're seeing, like you wear a head mounted display, and it can guide you through steps, like if you're cooking a dish or whatever, and or fixing an engine, which is one of the more realistic use cases like, you know, you're about to use the wrong tool. Well, the AI assistant can like, tell you pause, wait a second, Jason, you know, you're supposed to use the pliers here, not the screwdriver or whatever. And so we had this big demo last week on site. And we use one of the publicly public API's for LM is for this mostly as as help with reasoning, right? Like, we kind of like do some inferences about what's happening in the scene, and then go and like ask the like, for, like general reasoning capabilities to really understand what's happening, whether or not we should interact with the user. And the the API was down the whole time, like, the whole morning, right? So because we didn't have enough time to build the pull model locally, just in terms of training. So I think maybe there's there's that angle as well, right? Like, how do I customize one of the publicly commercially available ones? Or should I go use one of the open source ones, right? Like lava or GPT? Or whatever, right? Like, I think those are big questions that, that maybe don't talk too much about the hype, because I think still people are still on board, like, Okay, we're going to move in this direction. But the practicality of it, I think, is still big unknown right now.
Duncan Curtis 13:38
I would agree. And even I remember in the early days of generative AI just even a few months ago, we're starting to have real questions around. Well, who owns what's happening? Like, if I'm using a chat GPT we saw everyone wanting to use chat GPT or Bong, or different coding assistants. And the question started coming up like way who owns the code that's written with this assistance, and you start to get into all these interesting business use cases where you're like, hey, it's great that it helps. But is this something we can deploy at scale for customers? Or is it something that we own? Is it part of our our core IP as a business, which I think is also something that we're still very much working out? Really interesting space, lots of really fun, interesting problems to like, get your head around like your what's the right business model to get adoption. For example, if you look at something like iron clad is a really good example, for legal document processing. They've basically got their own instance where you pay a fee because that way your data is now being used to train models. And it's you own the IP at the end. That's basically a service that you're now paying for. But it's another one of those practicality examples that you're mentioning.
Rob Stevenson 14:41
What are some of the other use cases you've received? Jason, maybe we'll start with you here, particularly if you were to keep it in the field of computer vision
Jason Corso 14:49
for GE in AI. So let's see the way I think about computer vision, probably all a lot of like AI system development. I like the word AI system to be sort of like the collective right Like the code, the model architecture, the data, I think when you're engineering or developing these systems, right, it's kind of like a loop, right? Like you do some work with data. You do some work with models and with code and then you do some analysis and like, the faster you can iterate through that loop, the better off you are right? Like, in some sense, like a voxel might come to boxing with one like ourselves towards infrastructure to help help iterate through that loop as fast as possible. How can Gen AI feed into that loop? Right, like, well, on the data side, there's the obvious value of like actually generating samples that are useful. But the key is, how do you know what samples will be useful, right? Like without the like, model work, and then some analysis you do after that, like, you're just walking in a dark hallway. So you're just kind of generating like using Dali three or something like that to generate images that are like some other ones you have. But like, if you really if you know, and you can kind of prompt like current and future agenda and AI systems based on like, embeddings. Okay, I want you to generate me distributions of samples that are like this distribution in the embedding space. That would greatly reduce the burden on the data engineer as part of this like, data model analysis loop like that's a surefire when that goes there and maybe like, maybe one other and then I'll hand it over to donkey, perhaps like, on the on like the model and like the analysis side, I was invited at the Midas Center at Michigan and a few weeks ago to participate in this like, how is Gen AI revolution? Ai revolutionising experimentation? I unfortunately, couldn't give a talk because I was busy at the time. Like, all these talks were all about how do you use basically like, set up in context learning with pre trained kind of general you know how lens to propose new experimentation, setups, based on questions you have that are written at a high level natural language question natural language level, and yet rendered down into ideas that are asking questions at like a scientific analysis level, that maybe you didn't consider in the fact that it can kind of output your your Python code or your rust code to go do that for you. Like, at least like first iteration of that really speeds up the work?
Duncan Curtis 17:06
Absolutely, I was actually going to give a shout out to also on the voxel side was that being able to use it to create synthetic data and to augment your datasets is amazing. But you do need those other elements. So looking at whether you're calculating embeddings, looking at the embedding space at the distribution, but you also need to understand where's my model performing right now. And so if you've got, you know, one of the things we do with clients is looking at, like, given your model predictions, where are you right, and where are you wrong? And then you can also dig in from a data engineer perspective and look at, okay, well, this is more of what I need in my dataset in order to, you know, try to correct where my where my model may not be performing yet. But once you've got that insight, yeah, being able to create synthetic data is a fantastic way to be able to like broaden the distribution of your data. But it may also, it's also really useful for edge cases where you're like, hey, I haven't got a lot of instances in my data set. But I've got a couple and I can describe in a natural language way, the other things that I'm looking for. And so that can be a really interesting way to approach it.
Rob Stevenson 18:06
Yeah, synthetic data feels like, feels like a huge opportunity. I spoke with the founder of a company called accuracy. And the reason they call in Herbert, and they're using computer vision to train for specific to identify specific types of crops. So they input their training and on actual imagery, like photos that have been taken that however they're taken for, like the human eye, like it's data that was collected for humans, machines don't see the same as humans. So the input is like a photo URI might take. But then the output is like beautiful, multicolor tie dye sort of thing. It doesn't look like a plant at all, to me, but to a machine it does. And that felt like a huge breakthrough that like, oh, well, let's make data that a machine that considers how a machine sees pixel by pixel all the way across an image, as opposed to how you or I might, aesthetically holistically in terms of a photo. And so that is all, using the computer vision, like training with images, computer vision output is this synthetic data, fantastically brilliant use case I came across recently. So
Duncan Curtis 19:03
I will mention, as we were talking about before, the idea here that it's great as a synthetic data generation tool, but you still need to check that it's producing what you asked it to do, and that it is the right kind of thing. So having that feedback loop, that human in the loop sort of feedback loop is one of the areas that we've seen that is still very much needed there to at least sample the synthetic data great that's created to make sure that you're not like going off on a different path.
Rob Stevenson 19:25
Yeah,
Jason Corso 19:25
I mean, actually, I'll chime in on that point, as well, like take that as a segue to some other thing I wanted wanted to mention. So indeed, right? Like, it generates output that's interpretable by humans, right? Like period, not not, not all of it, not always but like like many of these, the notion of Gen expression is that, but I would think of that even as let's abstract a little bit more I it's almost like a democratizing it has a democratizing effect on like how and who can do this type of work, right? If I could be so bold as to make a prediction. In robotics. I feel that I have been In for decade or so now, like one big problem is semantic mapping, right? Like you have robots that can move around in environments. And robot knows about like the metric space like it's using some sort of occupancy grid or some other representation of the environment. And it doesn't sort of speak language, it speaks metric it speaks like, what it can see and how it can render that into a free space to move. Same thing with autonomous vehicles, for example, humans don't speak that language at all, really not even like robotics engineers, right? They're always trying to figure out what are ways in which I can like, get the robot to do what I want to do. But I don't really know if I if I have the right factor graph or whatever to do that. I would probably argue that one of the next waves we'll start to see is like, right now what how that bridge is made is through like semantic categories like door, and barrier and garbage can, you know, this is kind of like programming with a 64k memory buffer like we were 30 years ago, right? Like, not really that effective. What is effective is natural language, right? Like how I talk to my kids, or how, frankly, even talk to my dog who's probably smarter than most of those semantic mapping robots, right, like, so I think that we'll start to see like, the ability to use language generation and even like non language, but meaningful, semantically meaningful generation, like imagery or whatever, as a mechanism to translate between different languages, like robotic metric mapping and human speak. Is that a beachhead stage right now? Like just just the beginning, I would argue,
Rob Stevenson 21:25
is there a version of language in the middle of those two things of human language and the way that a machine would understand language, for example? So when I think of the way you read a Google search, restaurant near me, right, and Google knows what I want, and it gives me information, I've learned to speak to that machine. In a way, if I marched up to you and said that you'd be like, are you okay, Rob? Are you like having a stroke? Do you smell toast? So is there like a middle of the road? Like a pigeon sort of version of language? Or do you suspect JSON that like, with large language models? It would just be natural language processing? I can speak to it like I would anyone and not have to adjust my my parlance at all?
Jason Corso 22:01
Yeah, I would probably deflect it to say that, in most of these approaches, there's some notion of a latent space, right, a space that we don't directly observe. It's not the language itself, or it's not the data, the sensorial data that we input, it's like, I called it embeddings before, but it doesn't have to be like concrete vector embeddings. But there is some latent space that is present, whether or not there's one, like the V capital th e, the latent space. I don't know, I don't think anyone really knows. But like, it's common. And that's why we've seen like an explosion of vector DB companies in the last, I don't know, five, six months, right, like, and I think the right, the way I would think of it, at least in the short term would be that one can take, as long as you can sample that Leighton space, you can take projections of it and map that to whatever other space you want, right, like human language, or robot metric space, or whatever. So I probably argue like, I don't think it's a pidgin language that we would need to understand or whatever. But there's definitely some representation that is the translation representation. And I think it's probably the embedding space was the latent space.
Duncan Curtis 23:06
And then for more like just for general, like chat interfaces is where it talks about, you know, whether you have the chat GPS or baud, is that one of the things that's interesting, you gave the example of like, oh, I would Google, you know, restaurant near me. And the funny thing is that we're already seeing the prompt engineering where people are becoming actually like, so talk to me like you're a x, like you're an expert, you know, scientist, or you're like you're a roboticist. And then go do these other things, so that it has context for how it wants to respond to you. So when I argue that while there is a really nice layer that it's giving us per NLP, so you can naturally talk and naturally understand things, I think the way humans are going to interact with it in a lot of different cases, is still going to have this different version of the Pidgin speak equivalent, where you're like, hey, this is how you should interact with it to get what you want.
Rob Stevenson 23:53
Gentlemen, I could speak about Leighton space with you until the cows come on. But as a certain point here, we should keep going with the report, because there's lots more to get to here. But I do appreciate your satisfying my curiosity on that point. We have another takeaway here, I wanted to share, which is all about some of the challenges that people are facing, particularly in computer vision for machine learning engineers over the next year. That said on the left on the left there sort of separate from the question on the right, but first of all, 65% of respondents not being confident their models will survive and production. I was struck by this seemed high to me. But is that just as engineering is that just like, Oh, let's see if it works. Like, at a certain point, you have to go from turning into production and see what happens or is that number kind of consistent? What y'all see out there in the market? Or is this particularly high given the industry we're in?
Jason Corso 24:38
I love this number. I think this is honesty at its core. I'm really glad that number is not like 10% or 5% Because most models don't work, especially in our first iterations of models. Again, going back to this like data model analysis loop, right, like, first of all models, don't get trained ones get deployed and then live forever and ever and ever. They have to go and get adapted on the fly and updated over time. So like, it's a little unclear. This sentence answers almost like saying to me, is that the initial one? That's not going to work? Definitely not. That should be 99% that because not know, initially, it was, Is it the one when the engineers say, it's ready to work? Probably that'd be like 80%, then or is it the one when it's like, the tension between the business and the marketing team? And the engineers come together? Like, okay, look, guys, we got to the point now, and that that might be the right number there, right? Mostly because like, it's hard to build datasets, like, it's one thing to build models, right? Like models aren't we have a lot of algorithms, a lot of machinery, that's what we're trained in school. But there's, like limited training about how to build a good data set to go and train that model, right? Like, it's really like, I don't think of it as a chain, right? Like it's not go do your data work, then go to your model work, and then go do you have to deploy your model, it's really like this a loop or like a random walk, I wrote a blog a few, maybe a couple months back, like the machine learning random walk, where it's really like this CO development of your data set alongside your model, you might have a few different models at the same time, or even a few different projections or subsets of your data that you care about. And like you are building these things together. That's why I like the work that sama does this data work is so critical, so that we can appreciate the need to go and build a dataset that mirrors that has the same distribution as the distribution that we want to see in practice. Unfortunately, the in practice part is seldom what we like, measurable in the lab, right? It's always a little bit different, right? That's why like, I'm so glad we have so we've made so much progress, even in like AI or even non AI things like in video games, because like, it's hard to understand how users will use your system. We've learned that with 51, forever. And it's just a constant lesson that we learned. And I think this is just a situation where, in practice, you'll do a great job with the data you have the prior data you have and the information you have to go build your initial dataset, your initial model, but unless you iterate in the field, in practice, many times until you truly appreciate the fact that at the moment of deployment, you mirror that distribution, you can predict how your model will perform. Also, recognizing that the next week is going to change a little bit, it's really hard to know how well your model is going to do. And in most in my experience like voxel, like, we don't build our own models, really, we're an infrastructure company. But we talked to tons of companies and users that build models like so we hear these challenges all the time. And I think there's a lot of pressure to go release models before they're ready and fully tested.
Duncan Curtis 27:29
Absolutely echo that note, I'd add to it and say that I think it's also a maturity of our industry, where putting models into production is there's a process as Jason was saying, you don't stop. And you're never done. I mean, a lot of the software like the traditional software industry, there are people that I work with, they're like that are supporting a version of their code, that's a library that's 20 years old, that's just sitting in some system that's still just chugging away. And that's really different for AI where not only is what you had when you did your initial data collection, incorrect, potentially not true. But the world also changes for a lot of use cases. Sure, in some like really controlled environments, like maybe a manufacturing warehouse, you might be able to or manufacturing line, you might be able to keep the noise of change relatively low. But that's not true if you start doing new products, or new different lines and things that look different, or someone changes the lighting. So this idea that your model is never really done, we need to move to this, this concept of get it to production great. It's doing pretty well, we've got an idea against the data we collected. But this is never really going to end and the I don't think the industry is there yet from maturity standard to understand that that's just part of AI development. And there's tool chains that we can set up in order to actually manage that. Definitely
Jason Corso 28:47
agree that Duncan in fact, anecdotally, one thing that comes to mind, and this is maybe even getting back to the hype notion is that to actually feel production systems, you need 99 point X percent performance, right? Like the fact that your system misses on three out of 100 cases. 97% performance is great to publish a paper, but it's not actually good enough to field a to feel the system that could mean I don't know like in surgical systems, right? That could mean a slice vein or something like that or like in automotive industry that could mean even a near miss from a from a bicyclist is is dangerous because that might make the bicyclist do something crazy and go crash and hurt a kid or something like that. Right. So like, I think that the level of performances is really high that's needed to feel the real systems. And if you look back at like just even 20 years ago, what's the one of the first computer vision systems that was fielded? I can think of two right like one is the face the automatic face detection and like handheld cameras, and they use that like where that face was to autofocus like really great use, because if it failed, the user could just touch the screen to where you know to where they wanted to focus like it was an easy out The second one I can think of is the first down marker in broadcast television for college and professional football. I don't know, I think it works most of the time. But there's someone sitting in a trailer that turns it off. As far as I know, that turns it off when it's doing something weird and just gonna ruin the broadcast. That's why there's a couple of seconds delay and live broadcasts, like making things work really well requires a lot of attention to performance, or you got to set up the system or structure in a way that it is a human in the loop so that during the actual deployment in some effective way, those are the systems I've seen that work them the best.
Duncan Curtis 30:35
We had an interesting use case that are one from my past, who was an interesting, like, maybe easy to understand example, as well as that self driving cars about was about maybe five or six years ago, we started seeing these like line scooters, and the electric scooters, especially I'm in San Francisco. And so they just started showing up everywhere. Before that all the data collection. Yeah, skateboards, great. Pedestrians, great. Which one of them is this? And by the way, an electric scooter. So you could identify that, look, it's a vertical human on a flat surface. Yeah, there's a little bar in the middle. But you could pretty accurately say, hey, whether that would get captured by that idea. But when you're flagging it as a skateboarder that comes with a whole load of assumptions around, what's its top velocity, how's it going to behave? And an electric scooter can do like, what 3540 miles an hour compared to a skateboarder. And so it can be really different. Whereas as Jason was mentioning, having like a human in the loop to proactively find these edge cases, as the world changes means that you can go and collect new data or find data that you've been seeing in production, annotated differently, get your model to understand it, give it a new behavior class in that case. So you're like, oh, scooters actually act like this. And then you can redeploy it relatively quickly. Before, it's a major issue.
Rob Stevenson 31:51
Having a human in the loop to identify the Behavior class of how radical something is, I think is useful. You're right. Oh, that's a skateboard. Okay, much more rad. But to the point of models, Paul, Valerie once said that no poem is ever finished, only abandoned. Perhaps it's the same with models. And there's numbers 65%, as you called out earlier, Jason, that is encouraging. I think folks seeing that should be like, Okay, it's not just me, right? This is a normal part of the production cycle. If you are like, I don't know if this is going to work. Yeah, good. It's you and everyone else. So don't panic, if that's, that's where you sit. Related to this stat was some information that we gathered just on all of the challenges people anticipate facing. And this was sort of a check all that apply situation. And we have to report here that for ml engineers, and computer vision you're in for a world of hurt, because these are all the challenges people are seeing coming down the pipeline, we've kind of stacked ranked them here. And so I was hoping our panel could maybe pick one of the top three here and offer some solutions for it. So they are the number one challenge people anticipate is turning requirements into solutions and 61%. Below that collecting and generating data, of course, Evergreen problem, and then debugging and detecting errors and their data 52% report be concerned about that. So Duncan, we can start with you just want to pick whichever one you like and offer a solution. And then we can go from there. Alright, debugging
Duncan Curtis 33:10
and detecting errors in the data is I'm going to give a shout out to Jason and Vauxhall. Were finding the right tooling to help you be able to like understand your data, whether it's calculating embeddings and looking for the distribution space, or being able to visually parse through your data much more quickly, we actually have an instance in totally, and it's been super useful for us. And so in terms of a what might you do about these challenges is looking at your tooling to understand your data. How do you identify outliers early? How do you look at some of the things we see commonly is like, Oh, the camera, like let's say we get a range of images, and the camera shutter just didn't match this time. So you just get a black image? Not useful for us to label in that case? So how do we pull that out of a dataset and flag it for customers to decide what they want to do with it? So I'd say that with debugging and detecting errors in your data, I would definitely say look at some of the tooling that can enable you to move more quickly and understand your data better.
Jason Corso 34:02
Great. I appreciate that. Duncan, we totally dig it man, like, especially this notion of debugging your data. Like there's no category that defines the 51 or 51 teams tool that we'd launched a few years ago. Like, for a long time, I was like a staunch advocate internally for 51 is a dataset debugger, data debugger. And it is never the marketing folks felt like it would fall flat. So marketing is hard, harder than AI, I think. But if I had to go and I could take the number of the first one, right, like turning requirements into solutions, I think this is one of the hardest parts of really, I mean, most engineering work right? Like there's this quip and research that says the first 90% are like, you know, it takes 90 minutes to choose the right problem, then it takes another 90 minutes to solve that problem or something like that, right like choosing the right problem is hard. And I think like this is as equally the same for AI systems, right? Like, even just what is the ontology or the taxonomy that you need to get annotated for like, first of all, what data do you need based on your Problem requirements, that alone is a hard problem then like, Okay, you got the raw media? How do you convert that raw media into something you can use to get annotated to build a model on test the model on and so on? Like, one thing that came up that bit, one of the teams I was working with in the past in the bud for a while was the notion of groups. Right? Like, okay, is do I annotate individual people, right, and take small groups or a whole crowds of people, right, like, where do I draw the line? What do I need for my own work? Right, so like, I think that is a big deal, like creating the right sort of semantic space. And I think there are two axes of solutions, which is where one is, work with experts. That's important, like people, there's no replacement for good experience when it comes to actually boxing in a problem, especially in AI. And I've talked with Jerome at for quite a few times, like in different vectors in this space, how do you define different ontologies are different sets of groups and so on, like, the wisdom of experience goes a long way in requirements and solutions for AI, for sure. But I do think of the second, like, the other angle of a problem would be that, I think that Gen AI has some insights here that we haven't yet figured out how to how to tap. But I have the sense the intuition. My intuition says, We can learn a lot by the summary type of analysis that these LM models have done, especially now that we see like, Vision Plus LM models coming out whether or not they're again, commercial or private, or public open source, I think we'll figure out the right way to prompt them or in context, query them to get better insights about how to do requirement generation for AI, in the coming months, or quarters.
Duncan Curtis 36:38
Yeah, and I also mentioned Geneina is a great way to get started, you know, I talked about some of the problems with synthetic data and how you want to check it often in that early development stage where you're just like, hey, I've got an idea. And I think I have an idea of how I might solve it. She is getting that feedback loops smaller and smaller and faster and faster. Like it's a great way to get started, you can generate a few 1000 instances of like your synthetic data that moves kind of in the direction you want, before you really even understand the problem two, the depth you need. But it can get you there and show like, Hey, are you on the right path? Or is this maybe you need to take a different direction. So that might be another way that it could be useful to folks?
Rob Stevenson 37:17
Yeah, really helpful. Thanks, guys. Moving on with the report I wanted to pull out is how folks are measuring success and measuring the effectiveness, accuracy of their models, etc. And so I was hoping we could pull out some of these. And understand from you, too, this is what you look at as well, maybe there's something people are missing. And so obviously, one of the most reported 30% was user satisfaction. If you don't have satisfied users, you're not shipping a lot of products, right? So that as a TrueNorth. Makes a lot of sense. The most popular though, was standard quantitative model metrics, versus an accuracy etcetera. So I was maybe living in that 43 and a half percent. I can ask you, Jason to start with some of your standard quantitative model metrics. Because when you get into reporting and metrics, there are the ones that are the vanity, things are the ones that are maybe not really telling the whole story, the ones that look good on a slide deck, but maybe aren't actually helpful to the model. So can we start with just what are you looking at when you're trying to measure success?
Jason Corso 38:14
Yeah, that's a great question, Rob. So I think the notion of what, what's the right benchmark? And how do I measure my model performance against that benchmark has helped move the computer vision community or industry a long way in the last couple of decades, when I was starting as a grad student, a custom few dozen images, and some qualitative pointers with good like visual analysis. And even going back 20 years before that, actually, I teach foundations of computer vision course, where we weakly read like a seminal paper from the 80s, or the early 90s, or whatever, like, the gaming gaming paper, is like one of the seminal papers of Markov random fields has something like 15,000 citations, I clear work from clearly valuable work. I think there are four images or results in the whole paper, right? It's like a 25, page paper, right, like, so we've come a long way sharing code, sharing data, sharing open like metrics against that data. And it's it's been adopted and enamored by the research community, and then the professional community like you see here. One, it's like it's well accepted. People understand what precision is what recall is what AUC is like, we know what those measures mean. We don't have to worry about the model bias, right? Like there's new ones like for name, ception distance, and so on. Like, there's model bias in that model that's being used to compute analysis on my classical measure. So how do I analyze that? So they're accepted, they're well understood. They're taught in classes and so on, right? Like, that's what we in my computer vision class, and those are my peers as well. Like, that's what we teach. It's what's accepted. Also, it's easy because codes available for it. And like engineers, I'm a trained engineer as well, right? Like we're a little bit lazy. We don't want to always invent new things, right? So you can just download it and get it right so it's there. However, you can gain there is a big risk. It's one thing to do and research where everyone's using dataset A, and computing metrics, one, two, and three, there you go. It's like fair comparison. It's a much bigger risk and practice to then go and get your dataset, my dataset from January 15, and start to apply these measures on it and then get these arbitrary absolute scores like I need 80% precision before I can ship this model 80% is meaningless for dataset, your dataset because you can game the dataset like as an engineer, you know, this, this may be why like many of voxels customers, like they have split teams, data engineer, data team, and like ML team, we talked to both of them. So we play the wall, like they throw the things over the wall, we can do the translation, but like, if you don't appreciate what's in your dataset, and you're just setting an arbitrary measure, 80% 90%, whatever, it's kind of a meaningless number. And I think that's a big risk if you do it in practice. I don't know if you experience something similar Dunkin?
Duncan Curtis 40:57
Yeah, I was gonna say the same thing is that it's, I think it is really good to move the industry forward. And it's a good check to be like, Hey, I'm I'm moving in the right direction. But what I was really happy to see is that both the business metrics and the the user satisfaction, because it's not just about how your model performs, it's how your model performs when it's interacting with the end user. And so you know, there can be things that have nothing to do with your model performance, that could really affect your user satisfaction. Like if you have a significant delay, you know, we're talking about you were mentioning earlier, what if your model is offline, what if you've got called latency that actually really affects the user experience, actually, to improve the user experience wouldn't be the best way to do it wouldn't be to improve your model performance, it would be to reduce your latency between your the end application and how you're serving your model, you might even decide you want it to actually take your model work on shrinking your model down and pushing it to be running on a chip locally, so that the latency is much faster, that might be way more important for the user experience and driving business value than it is for like, Hey, did you get from 94 to 97% precision or recall, for example, I think it's useful. And I echo what Jason said, it's those can be good metrics and a good place to start to know if you're moving in the right direction. But it's really the Insitute the context of how your model performs that I think is the the most important.
Jason Corso 42:18
And just to follow up on that briefly, I think the it has been a delight to see even minimally but more than zero papers in like the ML research community, or like the CV research community that now actually have like, even like just the existence of Fernand Shep's. Inception distance is this notion of like, will the user care about it, right, but it's not a user study. But we've even begun to see some human in the loop type analyses. Even my group has written some IR B's recently, like last few years, like, what the heck is an IRB, this notion of like, well, the human matters, the user matters, right? So like, how do we build systems that will cooperate effectively in that case? Like, which is why like that DARPA project I mentioned, like, most of my research time now is being pushed in that direction. Because I do think like, the real problems are around like, how do these AI systems connect with human users? In a way, even if it's even if they're not personalized assistance, even if they're like general ones? Like, I think that connection is, is really interesting, really interesting.
Rob Stevenson 43:19
Interesting and crucial to making a really good product, right. And I was concerned looking at this charts, noting that only 8.7% of respondents reported that they're measuring business metrics to measure success. Because it doesn't agree with the user satisfaction chunk of that that chart does it like if you're not looking at churn, for example, guess what, you aren't looking at user satisfaction. But these are domains that are sometimes separated? If you know, I'm sure like measuring customer churn is not something that is maybe covered in your computer vision course the University of Michigan does, or maybe it is I don't know. But it's like this is a separate domain. We have folks who are extraordinarily technical, but I I fear that maybe they don't always take the time, or I guess this report suggests they don't take the time to look at some of these business metrics. So is that an oversight? In your view? Duncan, maybe do want to go in on that one first?
Duncan Curtis 44:07
No, I'd actually argue that it's a reflection of maturity that I mentioned earlier, is that we're seeing that AI has moved more and more for research to practical application, and that we're spending more and more time on these problems as we're getting more and more models to production. So not only do we need to figure out the tooling for keeping models in production and performing well, but we need to start thinking about them more holistically. You know, I I work with Jerome that Jason was mentioning before director of machine learning it at Sama, we actually have dedicated human interaction researches for when we're deploying our AI because it's actually a really important aspect for you to consider from the beginning. And that's just in when we're using AI internally, let alone working with our customers who are developing their own AI is in so many different fields. So I, to me, it feels like a natural progression of of maturity. That's where I'm, I'm on.
Jason Corso 44:57
Yeah, I would tend to agree, I think If the the ability to tie like business value to an AI system requires a significant investment in infrastructure, and we're only beginning to see like these visual AI or computer vision, pieces make their way toward the end. Like, if I had to, I would probably, I would interview like UPS or FedEx, like they've been doing package sorting, or like even the USPS, like with with computer vision type, but it wasn't called that then. But it's like with pattern recognition systems for a long time now, like, how did they tie in business value or like, you know, actual cost value into it? Like, I think the business value we I hear about most these days are very like to the maturity point, or very often, like the early side of it, right? Like, okay, we're gonna invest k dollars in data in the next six months. How can we know that data has helped, then basically, like the early the the existing benchmark scores are used, it's an unfortunate reality. But I think that's the business value is still tied into the early ml, almost like the ML echo chamber, what I'd expect to see in the coming years is like, better infrastructure to put like the data, the models at the heart of the world, and tie business work right into it, so that you can really understand the closed loop value. Yeah,
Duncan Curtis 46:14
I'd love to see like that infrastructure that actually hooks into all of this business metrics is something that have been around for a long time, we've never well understood in, in especially the traditional software space, like, you know, churn and other elements, this data actually exists. And we could be hooking that into our systems and actually looking at once you've got a model live, and you want to maybe start AV testing models, variants, when you want to start looking at what business impact it has. That's definitely a really exciting space for us. In the future.
Rob Stevenson 46:42
You're not going to believe this, Jason, I'm actually recording with a director of data science and machine learning from UPS next week or the following week. So you've just done my job for me, I know, I know exactly what I'm going to ask her tuned in for that one. But we are creeping up on optimal webinar pod and our podcast length here, and more importantly, the time that we have reserved for this, but I don't want to let you all go just yet. Moving out of the report, I wanted to ask the both of you to share a little bit of your career advice for the folks out there listening who want to continue forging a career in this space of being strategic and making huge meaningful contributions at their companies? What sort of advice would you give them? Duncan, do you want to chime in first?
Duncan Curtis 47:22
Or I would say the ability to speak multiple languages has always been an area that has been absolute huge value. And I don't mean English to French. What I mean, there is how do you you having that technical skill set is absolutely amazing. And it's really, really valuable to understand technology to be able to develop it yourself. But there's other hats that we've been talking about whether it's the business metrics, how does that part of the business, think about it, whether it's about user research, and design and aesthetics and interactions, getting yourself educated in those different languages and the way that you can understand the business impact of what AI is doing that will help you rocketship both the impact of your models, but as well, as you know, your particular career.
Jason Corso 48:04
Awesome. Maybe I can have two mini ones. One is that we're talking about very technical ideas, like we talked about leaving space and embeddings. And like models like these are, these are hard ideas, right? Like, I've learned over the years that that's, that's great. But just remember, what we do in this field, and in all fields really is by humans, for humans, and with humans. And I think if you miss that idea that you will, you will not achieve the potential either your own potential or the group you're working with, or the tool, I think, remember humans, the other bit would be similarly, like, in this complicated space, these are a really high dimensional ideas like really reasoning about embeddings that are two to the 10th or whatever. Like, that's not something humans are pretty are that good at. So I would recommend developing a muscle to try to predict first of all doubt, and then predict what's going to happen before you do the work. And then I think that ability to the muscle to like start to self reflection about what you're doing it through predictions, your own generation, if you will, will pay its dividends in the future. That
Rob Stevenson 49:16
is great advice. Remember, human What a great name for a podcast that would have been what a missed opportunity. This has been really that's fantastic advice from both of you. Thank you so much for sharing and for being here today and sharing your expertise with us. I think we are just wrapping up here. So at this point, I will just say thank you everyone for joining us. I've been Rob Stevenson Duncan Karissa has been Doug and Curtis Jason Corso husband, Jason chorsu. And all of you out there in webinar land have been amazing, wonderful, talented data scientists, machine learning engineers, AI practitioners, what have you however you came. We are so glad that you were here. Thanks for being with us here today. And hope I see you out there and webinar Orlando again. Have a great one. Bye