How AI Happens

Qualcomm Head of AI & ML Product Management Dr. Vinesh Sukumar

Episode Summary

During Vinesh Sukumar’s colorful career he has worked at NASA, Apple, Intel, and a variety of other companies, before finding his way to Qualcomm where he is currently the Head of AI/ML Product Management. In today’s conversation, Vinesh shares his experience of developing the camera for the very first iPhone and one of the biggest lessons he learned from working with Steve Jobs. We then discuss what his current role entails and the biggest challenge that he has with it, Qualcomm’s approach to sustainability from a hardware, systems and software standpoint, and his thoughts on why edge computing is so important.

Episode Notes

During Vinesh Sukumar’s colorful career he has worked at NASA, Apple, Intel, and a variety of other companies, before finding his way to Qualcomm where he is currently the Head of AI/ML Product Management. In today’s conversation, Vinesh shares his experience of developing the camera for the very first iPhone and one of the biggest lessons he learned from working with Steve Jobs. We then discuss what his current role entails and the biggest challenge that he has with it, Qualcomm’s approach to sustainability from a hardware, systems and software standpoint, and his thoughts on why edge computing is so important.

Key Points From This Episode:

Tweetables:

“Camera became one of the most important features for a consumer to buy a phone. Then visual analytics, AI, deep learning, ML really started seeping into images, and then into videos, and now the most important consumer influencing factor to buy a phone is the camera.” — Vinesh Sukumar [0:07:01]

“Reaction time is much better when you have intelligence on the device, rather than giving it to the cloud to make the decision for you.” — Vinesh Sukumar [0:20:48]

Links Mentioned in Today’s Episode:

Vinesh Sukumar on LinkedIn

Qualcomm

Episode Transcription

EPISODE 40

VS: So I was privileged enough to be part of the team, developing the camera sensor for Apple. And that gave me an opportunity to work with some of the best at Apple. That includes Steve Jobs.”

[00:00:14] RS: Welcome to How AI Happens, a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers, as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson, and we're about to learn how AI happens. 

[00:00:43] RS: Here with me today on How AI Happens is the Head of Artificial Intelligence and Machine Learning Product Management over at Qualcomm, Dr. Vinesh Sukumar. Dr. Vinesh, welcome to the podcast. How are you today?

[00:00:54] VS: Not too bad. Thank you, Rob, for having me. It's a pleasure talking to you.

[00:00:57] RS: I’m really pleased to have you because Qualcomm touches so many different areas. So many different use cases of AI and ML all bubbled up within this organization. I'm really excited to hear about some of the commonalities and what it is that comes up in your role every day. We'll get into that. But first, would you mind sharing a little bit about your background and how you wound up in your current role at Qualcomm?

[00:01:16] VS: Yeah, absolutely. I started my professional career, I should say, at NASA at Jet Propulsion Labs, basically doing a lot of image sensor designs for, I should say, space missions. That's when I kind of entered into the space of visual imaging but was mostly research-oriented. Then after an acquisition, kind of working for [inaudible 00:01:36] at that point of time, it got transitioned and got acquired by Micron. That then entered into the consumer space, and we started doing a lot of these miniature CMOS-based image sensor designs, transitioning from CCDs. 

Our first commercial dream to recollect was the Moto Razr phones, which was the first phones to come up with a VGA camera sensor. Quite a popular hit those days, the flip phones, and it became a huge instant hit. Then they kind of started putting a lot more emphasis on designs of cameras, and we did like two or three generations. Apple got interested. They wanted to introduce the very first iPhone 1 and a lot of emphasis on camera. 

I was privileged enough to be part of the team, developing the camera sensor for Apple, and that gave me an opportunity to work with some of the best at Apple. That includes Steve Jobs. That gave me, I should say, the window of opportunity to open up on what is visual quality, what is computer vision. That's when I started to put a lot more emphasis on vision, visual analytics, what can be done using an image as a foundational modular block. Then with time, it got expanded.

Later, younger days, not married, not kids. So I thought of exploring how Asian market works. So I moved to China. I was working with Lenovo, trying to really understand how the consumer market works in China. I had an opportunity to work with the mobile business group, spread across multiple form factors, includes PC, IoT devices, mobile phones, etc. The emphasis on visual analytics kind of grew on me. I spent a lot more time there and then came back and worked for Intel in various capacities at Intel. 

That's where the larger ecosystem was transitioning from computer vision to artificial intelligence. That included machine learning. I have structured data. What do I do with the structured data? Can I do a lot more use cases? That's when I got more exposed to a lot of these fundamental system level challenges, these use cases spread across multiple verticals and domains. That's what I would say. I got really calibrated to a large extent. I spent a lot of time on the Israel Design Center, IDC. I worked on data centers. I worked in automotive design units. 

We also had like a lot of acquisition that kind of came in with Nirvana, Nirvana, Mobileye, Movidius kind of stuff. So being part of that team, you get exposure to a lot of stuff, and that's how I got fully integrated into what AI really meant for the larger ecosystem and then got more interested. The good thing is I had my PhD and my doctoral study around computer vision and deep learning. But the application when I got my doctoral degree was very limited. So having these use cases, I was able to marry my concepts I had at a very early phase. That's my kind of be a student at heart almost every single day and then try to define new things.

[00:04:28] RS: I love that. So technology has finally caught up to the work you were doing for your dissertation. Is that fair to say?

[00:04:34] VS: That is absolutely correct. One of the biggest challenges I had with my thesis was I was asking myself, “Am I doing the right thing?” My professor always used to guide me, “Even if you do the wrong thing, it’s always a precedent that you set for the next guy not to do exactly the same thing.” That way, there’s a new feel at that point of time, but at least I've had some fundamental investments that I can even use today.

[00:04:54] RS: I think that question, am I doing the right thing, that is a good place to be in when it comes to working in cutting-edge technology, right? There has to be a little bit of uncertainty because if it was absolutely certain, then probably someone else has done it too. There's probably like external validation of someone else's work that's like spurring you on, as opposed to just like the route of innovation.

[00:05:15] VS: Absolutely. I couldn't agree more. That's, I think, what makes technology interesting and your work on a very daily basis more exciting because you're exactly not quite sure what problem you're trying to solve. You come in and you know exactly it's completely different. Like Swiss cheese, there’s plenty of holes. You don't know exactly which hole you want to fill the same day. As you start working on this, you challenge yourself and you, obviously, learn a lot more things.

[00:05:37] RS: Yeah, yeah. This is a fantastic approach to this field, I think. Boy, you had such a varied and interesting career. I feel like a kid in a candy store as an interviewer here, in terms of which direction we could take this. I would love to learn about early cameras and phones, just because it's such an essential piece of it. But it wasn't then. 

Immediately, before the iPhone came out, the idea that the camera would be like an anchor, really, and that as capturing content, producing content became top of mind for the consumer, it would become more important. What was the challenge at that time because it wasn't enough to have the camera in the phone? It had to be really good, right? It had to be smaller than ever before. Is that kind of the thrift or the challenge? Or how would you explain the way you approach that problem at Motorola and then at Apple?

[00:06:21] VS: Yeah. I think, as you mentioned, the very first important element is how do you really transition from a CCD to a CMOS-based image sensor with an expectation that you continue to provide, I guess, a DSC class kind of image quality? Because the most important intention of having at least on those days, trying to have a camera in your flip phones, was to capture a single image, right? It’s not about video. It’s mostly about image. Can I capture something decent enough that people would continue to use it? The displays were quite small in nature, so you don't really have the biggest of displays to look at it but to at least capture good quality content. 

That was the most anchor point, and the way we try to approach it is do I happen to have the right infrastructure in terms of my analog design? Is my digital components good enough to clean up the noise? Do I happen to have basic image algorithms that would clean up the noise without actually distorting the image, without actually taking the spatial information present in an image? So that was most of the focus, but those are very static in nature. There was no AI involved. It was always one blind filter that you put an image through, and you get an output, right? 

Now, with time, obviously, we started to make these filters more curvy, S curves, Y curves, as such. So only portions of certain image are lost or not lost, when you try to keep the dynamic range as close as possible or as high as possible. So that was the focus. But at the time, as consumers start putting more interest in cameras and camera became one of the most important feature for a consumer to buy a phone, then visual analytics, AI, deep learning, ML, started really seeping into images. For images, that kind of went on to videos. Now, you can see people – I guess the most important consumer influencing factors to buy a phone is camera, right? So that's how it kind of worked out.

[00:08:05] RS: Yeah. As you mentioned, it was kind of an arms race for people in terms of the camera quality on their phones. What were some of the early AI applications that were put into smartphones?

[00:08:15] VS: Quite simple, if you look from today's standard, but was mostly looking at quality enhancements. Can I really make sure I don't lose detail using AI? Can I happen to have a high dynamic range? What can I do using AI to really supplement a support high dynamic range? Can I really make sure what we call as a traditional three algorithms, the auto focus, or the exposure, or the white balance can kind of get stabilized when you happen to have a very large moving content using AI? Or if I happen to take a video, can I make sure can I get the right amount of highlights in a video context so that you exactly know how do you classify it accordingly? 

Those are pretty much I think the early trends on how AI was actually influencing an image and a video but mostly revolving around quality. It could be low light. It could be highlight kind of stuff but mostly around quality.

[00:09:06] RS: What is considered noise with computer vision? You were mentioning a lot of the early effort was filtering out noise. What are some examples of things that you'd consider noise?

[00:09:15] VS: The most important element is you really want to have high dynamic range and capture details. You want to really make sure, when you're trying to really infuse a very small camera into a phone, you're able to capture that at extremely low light conditions. So when you're doing that, historically, what has happened is the dark noise, what we call the darker pixels, start manifesting itself as what they call blinking pixels. So you see a lot of this white nice kind of an image. You really want to avoid that as much as possible. You want black to be black, right? You don't want black to be infused with a lot of salt and pepper noise. Because if you do that, I can actually catch it and it does not look as pleasing capture. 

Anything which is dynamic in nature, anything that is fluctuating in one frame and it does not show up in another frame, I would call that as a noise, right? You really want to make sure that does not really seem itself or come out in an image or a video. We would try to use AI to really make sure that gets eliminated in every form and fashion.

[00:10:10] RS: Salt and pepper noise is good on a beard but bad on an iPhone. I think that is the general maxim there.

[00:10:17] VS: That is fair. 

[00:10:18] RS: Thinking back to your time at Apple, can I ask you how technical Steve Jobs got? Obviously, he was very astute with understanding consumer wants and markets. But you're a very technical man. A lot of other very technical people on that team. How in the weeds with this stuff would he get with you all?

[00:10:36] VS: My personal experience in some of the sessions I was previewed to, he was quite technical in nature, and expectation is whenever you're trying to present to him is more whitespace and less about collateral on a presentation. So in other words, you put a few sentences, and he wants you to be descriptive of what you actually mean. The expectation is that you understand the problem statement. We're able to quantify why a solution works that doesn't work and followed by certain recommendations, and why your recommendations actually work. 

You’re able to engage in that level of conversation, which actually was helpful because then he could pinpoint, “Okay, I understand your reasoning, that this makes sense, that this does not make sense.” There are certain elements, especially if you're trying to look at visual quality. It’s very subjective in nature. So you also have these big screens. You put these images up and you start looking at them. I’m not quite sure how things are done today, but that is how we were trying to look at and then try to understand, “Hey, I don't think this is good.” Or can you can go use certain amount of filters to [inaudible 00:11:34] up. But those were some of my early experiences.

[00:11:38] RS: Boy, that seems stressful making a keynote presentation for Steve Jobs. Was that intimidating? How did you approach those conversations?

[00:11:44] VS: It was, but I did have a good guidance from my team at that point of time. They exactly kind of coached me on, how do you really want to approach it because one of the most important things I've learned is to learn your audience. Once you learn your audience, you try to manifest your collateral and presentation to really meet the audience expectations. But if you go in blind, then it doesn't really work. So it's important you study your audience before. So that's what I even use today. Truly understand who am I actually presenting it to, and it’s quite helped me from that professional standpoint. 

[00:12:13] RS: That's important, even when you're not speaking to a generational techno elite. Let's fast forward in time to Qualcomm a little bit because you now have added to your focus a lot of other areas, in addition to visual imaging and computer vision. Qualcomm sort of has use cases in just about every AI and ML industry you can imagine. Would you mind sharing a bit about just how you kind of characterize your role and what is your key focus right now?

[00:12:36] VS: Currently, my role in Qualcomm is to lead the AI product team. As part of it, we do a couple of things. A, we lay down the overall AI strategy across multiple verticals. How do we really define that vision? All this work is obviously done with a lot of help from engineering and research teams because it's kind of a collaborative relationship with the execution team as well. 

Second thing is once you lay down the vision, I guess how do you translate that vision into what is exactly needed from an architecture standpoint, from a hardware standpoint? Do we have to invest in compute? If I have to invest in compute, what would be the performance anticipated from that compute? What would be the energy efficiency and spirit from that compute? Does it support multiple modalities, it could be voice, whether it be speech, could be text, or can be multi-modality, as an example. 

Then we jump on to system software and software. What exactly is needed on the middleware side, on the low level API library stack? On the system side, what kind of overseers are we trying to support? We’re obviously participating in multiple videos. So we have preview and access to many common OSS like Android, Windows, QNX, and each of these OSS have their own challenges. 

Last but not least is we want to really touch our developer community. How do you reach them out? How can you make it easy to program? Can I accelerate my solution deployment, either be students, or could be independent developers, or could be independent software vendors or overseas vendors, those kinds of things. So that's how we know. We try to work around, but that would be my primary focus of attention at Qualcomm.

[00:14:12] RS: That I think is a great outline of just monitoring product management and shipment really of products. What are some of like the technical challenges that are common across various AI and ML use cases?

[00:14:24] VS: That’s a great question. I think, from my perspective, one of the biggest challenges I see pretty much across all verticals that I'm previewed to is the AI solution deployment that I happen to have models which have been trained on a server platform or in a cloud platform. How do I take that complex model and then optimize it to fit a certain form fit and function for a certain device? That optimization to a large extent takes a lot of time. 

These models these days are quite complex in nature. Then when you want to really fit a certain hardware intrinsics of a platform, you really need to understand what the model exactly does, how you’re able to integrate it. So a lot of time actually is spent on optimizing the model to make sure the application KPIs are not compromised in any form or fashion. If it does, what can be done to really make sure you don't really impact UX experience? That is most of the time spent. I think that's one of the biggest challenges. I'm not suggesting there are not other challenges. There are. But to a large common factor, this kind of really seeps up.

[00:15:27] RS: So is that related to just the multi-device aspect of developing a model that you don't know exactly how it's going to be used or exactly how it's going to be treated once it gets in the hands of a consumer? What is the concern there?

[00:15:38] VS: I think there are a couple of layers of concern. If you look at the entire cycle of ML Ops, obviously, the ML Ops is divided into four buckets, which is the data preparation and the model preparation phase, which is the initial phase of ML Ops. Then you have to optimize it and then monitor it, which is the later half of your ML Ops. Historically, what happens is these models which are being developed are done by data scientists. These data scientists, to a large extent, are not fully connected to the device that's going to be deployed on. End of the day, they look at a certain application, they train these huge models, and they say, “I believe these models serve as the application KPIs.” 

But when you try to develop these big models, they have a very large footprint. In many cases, it may not fit a certain edge nor device. So we have to kind of quantize the model or we have to compress the model to a point that it can actually run on device, so really going through that phases. Then Qualcomm supports many flavors of HP’s devices, from IoT to compute, mobile, XR devices. Some of these form factors are kind of constrained in power. Some of these form factors are constrained on memory footprint. Some of these form factors have multiple applications running at the same time. So it's quite important to really understand, depending upon importance, which model you want to invoke at a certain point of time. 

Given some of the features that are very different, it becomes important for us to really understand it and then can optimize a model that really fits that vertical application needs. That has been pretty sure it’s just not a unique challenge for Qualcomm. Pretty much most of the silicon vendors or platform enablers or solutions have something similar. We're just trying to make sure we can accommodate and then accelerate this deployment as fast as possible.

[00:17:19] RS: In a fully cloud-enabled world, how important is edge computing?

[00:17:24] VS: I think edge computing is extremely important, in many ways. I think, historically, edge computing really took its shape primarily because of privacy, latency, and performance. Now, what is now happening is on top of it, you want to add elements that involve user personalization, wherein you really want to make sure that the model is personalized to the user behavior pattern. That can happen only if you really understand the user, and you're able to store content related to the user within that device. It’s all possible, and it's happening with HP’s computing. 

That's what I would emphasize. In addition to the historical trends of privacy, security, and latency, the element of user personalization is really showing up. That is even more pushing towards edge computing, rather than driving on the cloud. I'm not suggesting by any means cloud computing is not important. But I think edge computing is kind of taking its own shape and if you really want to connect to the user. 

[00:18:22] RS: There's some interesting work being done to alleviate privacy concerns with cloud computing with just the anonymization of data. It's like never personally identifiable, regarding what data is actually harvest process, etc., how long it’s stored. In a world where an organization could properly alleviate privacy concerns with cloud computing and as it decreases to latency, that feels just inevitable. Internet speeds get more widespread. Processing power gets more powerful and smaller. What do you think is like the last holdout if we were to address those two things in time, the privacy concerns and the latency? What is still the lasting need for edge computing?

[00:18:57] VS: There's always an open question on how secure is your privacy, right? There's always an open debate. There's not always a clear answer. To really provide the benefit of doubt, we really want to make sure anything that's private to the user does not leave the device. As much as possible, you want to contain it within the device. With regards to communication, as you mentioned, as Internet speeds really go up, the cloud connectivity portion is really good. 

But there's always going to be situations, and I've had this personally, because when I traveled to a location in Japan, most of the signs are all in Japanese. I really wanted to understand what they actually meant because I wanted to go to the restaurant that serves me a vegetarian option because I don't eat meat. So that was very difficult to actually communicate. I was actually trying to use my phone to read exactly what the menu card was. But for some reason, I didn't have connectivity to the cloud. 

Then what I did realize is I didn't have to have a translate app, which was in my phone, and it was able to support my communication with the local host in an easy manner by just making sure I said the right language in English and was able to translate that to Japanese and also read the menu, which was in Japanese to English. All the analytics portion of that was done on my mobile device. It didn't actually leave the cloud. It would have been great if I had the cloud connectivity. But in that specific location or region, I just didn't have it. When you happen to have these conditions, on-device intelligence is paramount and helps a lot.

[00:20:27] RS: Yeah. It's an interesting debate too because we expect this trend of processing power to become more commoditized and smaller and more accessible to continue. So you can imagine a time not too far from now and probably already exists where your device is powerful enough to have all of that locally. Is your argument that that's preferable than needing to connect to the cloud, even in perhaps, say, like 100% connectivity enabled planet?

[00:20:54] VS: It's a great question. I would probably also look at it differently, mobile being one such factor. We have the autonomous driving, autonomous robotics being another factor. Especially when you have cars, which are driving at 100 miles per hour, you want the automobiles to be making a decision on its own, rather than depend on connectivity on the cloud to make a decision. Especially when you have pedestrian detection, or you want to avoid having an accident, you want that to be spontaneous. You want it to be instant. 

Now, having that intelligence built within the vehicle itself, in my view, would support it. I think we have done a lot of case studies I’m pretty sure even the ecosystem has done. But in the scene is the reaction time. That’s much more better when you happen to have intelligence on the device, rather than actually giving it to the cloud to make the decision for you. That could be a life and death scenario in that case. 

There’s going to be always instances where you can rely on cloud. But to a large extent, in my personal experience, it either be a common consumer enterprise devices, going into these growing consumer products, or even complicated systems like ADAS, I believe on-device intelligence is really going to make a big difference.

[00:21:57] RS: Yeah, that makes sense. Ordering in a restaurant, a little bit of latency is annoying but acceptable. Any latency above zero and then an autonomous vehicle, for example, is unacceptable, right? 

[00:22:07] VS: That is correct. Yeah. 

[00:22:09] RS: Yeah, it makes sense. So Qualcomm also, some of the products you’re shipping, they would like instantly be deployed at a huge scale to loads of consumers needing magnificent amounts of processing power. Then also, you expect them to grow over time. What's your approach to building and scalability with this tech?

[00:22:24] VS: I think it's quite important for us from a scalability standpoint. But scalability can be approached in different ways. You're talking from a hardware standpoint, from a system standpoint, from a software standpoint. Obviously, our expectation is to drive scalability across all three facets, which I just mentioned on the hardware front, all the way to software. Because then you can actually learn and distribute knowledge from one vertical to other vertical, and then you can have uniformity from a cloud to edge deployment. Because it's the same software stack and the same hardware, you're the same API. So you can actually do a lot of distributed computing from that perspective. 

My expectation and my push has always been try to stay consistent because you're going to have a lot of these applications as a function of heterogeneous compute, or you want to have uniqueness. You could really supplement having that push into scalability. We do that even today, when you look at some of our investments that we do on a hardware front. Obviously, mobile being a strong anchor point, we tried to use it as an anchor point and then serve as other verticals or making custom feature modification to service that specific vertical. 

Same thing on the software side, we try to really make sure we have a lot of investments on the software front that services certain vertical with mobile being one of the most key vectors for us. Then we go towards adjacent markets like ADAS or cloud edge or augmented reality or VR headsets. There are going to be certain focus of attention on specific use cases. Then we try to make sure we put a lot more investments to really drive that. 

But if you look at the compiler investments, the runtime investments, the operator investments, or the library investments, they're pretty much kind of spread across all views. So from that standpoint, I think scalability is a strong push for us.

[00:24:01] RS: That makes sense. How much do you think about being constricted by a specific kind of tech stack or being married to one tech stack, right? You want something that can be a little agile that you can customize a little bit. Is that a concern for you with scalability?

[00:24:14] VS: I don't think it's much of a concern. The way we try to really supplement is we do two things. One as a performance tech stack, where we really want to aim for highest performance that we could get from a hardware either being inferences, latency, whatever that might be, the KPI for the application. There's a certain component which is more towards scalability, wherein you might have application vendors. They want to have the same solution stack with as little modification as possible to run on an Arm platform or an x86 platform or any other platform of choice. We also try to tailor to that audience to make sure we have that scalability stack. 

As anyone would anticipate with scalability stack, you always lose performance. It’s very difficult to get performance and scalability at the same time, but it definitely accelerates your time to market. Then you don't really have optimization challenges, right? You're addressing portability as quick as possible. So we tried to do that across different flavors of OS, and that is either be Android or it could be Windows or could be QNX and those flavors.

[00:25:14] RS: Makes sense. Dr. Sukumar, we are creeping up on optimal podcast length here. But I don't want to let you go just yet. Because you, in your role, touched on many different use cases, I'm curious what is most exciting to you when you think about some of the applications of this technology? Inside Qualcomm or outside of Qualcomm, really, over the medium term, the next five years, what gets you really excited about this space?

[00:25:37] VS: It's kind of been an interesting evolution. Mostly, I started my career, as I mentioned before, focusing on visual analytics. With time, I’ve gone to look at audio, text, linguistics, commerce kind of stuff. But now, I'm kind of entering an area where I think it's more about cognitive AI. For me, it's an exciting frontier for me because cognitive AI, to a large extent, is about providing an avenue to address a lack of common sense knowledge in AI systems, wherein you look at multimodality fusion. I could take elements of different data modalities and fuse it to really understand knowledge representation of any given entity, right? 

That's what is exciting to me. People are kind of doing a lot of research on this stuff to really make sure AI systems are truly AI. I think it is great. Now, I don't anticipate this will lay the foundation for AI to take over the world, at least not yet. But at least kind of gets you to the next frontier of what is cool in AI.

[00:26:33] RS: Fantastic. Dr. Vinesh Sukumar, this has been a delight speaking with you. Thank you so much for being on the podcast and sharing your expertise with me. I've loved learning from you today.

[00:26:42] VS: Thank you so much, Rob, for having me and providing me this opportunity to present my views to your audience.

[00:26:53] RS: How AI Happens is brought to you by Sama. Sama provides accurate data for ambitious AI, specializing in image, video, and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, e-commerce, media, medtech, robotics, and agriculture. For more information, head to sama.com.