How AI Happens

Building Trustworthy Behaviomedics with Blueskeye CEO Michel Valstar

Episode Summary

Academic turned entrepreneur Michel Valstar joins How AI Happens to explain how his behaviomedics company, Blueskeye AI, prioritizes building trust with their users. Much of the approach features data opt-ins and on-device processing, which necessarily results in less data collection. Michel explains how his team is able to continue gleaning meaningful insight from smaller portions of data than your average AI practitioner is used to.

Episode Notes

Michel Valstar on LinkedIn

Blueskeye AI

Episode Transcription

EPISODE 23

“MV: Not only do we do face analysis, which is people really understand that that is highly sensitive. We’ve also chosen to do that in areas such as mental health, ADHD, autism. And we've chosen for people to do this in their own homes, so the data can't get any more private than this.”

[INTRO]

[00:00:24] RS: Welcome to How AI Happens, a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers, as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson, and we're about to learn how AI happens.

[INTERVIEW]

[00:00:54] RS: Joining me today on How AI Happens is the founding CEO over at BlueSkeye, Michel Valstar. Michel, welcome to the podcast. How are you today?

[00:01:02] MV: Thank you very much. It's an honor to be here. Many great ones have come before me.

[00:01:08] RS: That's right, and hopefully many great ones after. I'm interested in your background a little bit, Michel. Are you an academic turned entrepreneur, entrepreneur turned academic, both at the same time? How would you sort of characterize your own journey and leading you to have found BlueSkeye and continue this role teaching?

[00:01:24] MV: Sure. No, I'm very much an academic to an entrepreneur. I moved into this area because there were quite a few companies interested in our technology. Our research outputs really. That was probably caused because when I started working at the University of Nottingham, I very clearly focused on technologies and research outputs that actually work. So I only worked on projects where I thought it had a chance to actually run on modern hardware, rather than being able to run perhaps some time in the future on hardware, right? So slightly less intricate, complex new techniques but stuff that works.

Of course, as a result, industry went like, “Oh, can we have that face tracker or can we have that facial action detector,” which your facial muscle actions that you can detect. That's how two or three companies really started engaging with us, started licensing software. Through that, I was quite closely involved with the technology transfer. So I actually had calls with the software engineers of those companies and with the CEOs of those companies, CTOs of those companies. Through that, I learned just how hard it is to build products and how hard it is to go from something that sort of works in a lab but now has to work in a real industrial environment, right?

Perhaps more strangely, at least for myself, was that I really enjoyed that process. I really enjoyed helping the customer get the most out of our technology and for us to refine that further and further and further. That process of iteratively improving what we had was something that I actually enjoyed. So then came the point where there was too much demand basically for it, and the university wasn't really set up to deal with these individual requests. So I started the spin-out company and then took about a year determining what to do with it, because now all of a sudden you need to find a purpose and a direction, so yeah.

[00:03:50] RS: Well, I definitely want to get into the purpose and direction. First, though, can you maybe share the high level of what it is exactly BlueSkeye does and the application of your technology?

[00:03:57] MV: BlueSkeye AI builds AI that you can trust to improve your wellbeing through face and voice analysis. A lot of people are struggling with the concept of face analysis and voice analysis, and the whole face recognition backlash in the community is real. I think the problem there is unnecessarily stops the adoption of life-changing positive technologies. So in response to that, we built AI technology that is transparent, that people can trust, that runs, for example, on the edge rather than on the cloud, so you can really see that no data of yours leaves your device. That allows people to trust the face analysis and to trust the voice analysis that we that we do.

Hopefully, that will mean that more and more people will adopt these technologies. Because we are a B2B2C company, so we sell our software, our models to other companies to integrate, for example, in their social robots or in their virtual assistants or to build health and wellbeing solutions on mobile phone apps. Those companies can appoint to us saying, “Look, we don't just use anybody. We use BlueSkeye, and they are known to be trustworthy when it comes to the way they treat data, especially face data, facial behavior data.”

[00:05:30] RS: Your approach here I think reflects an important trend, whereas previously the approach was this notion of technical risk. Can we do it, right? Can we build this technology? Can we make this application work? Then we'll deal with the market response later. We'll build trust later, right? Let's first make sure we can do it. You’ve flipped that a little bit. You have set out to prioritize trust and not suggest that, “Oh, we will allow people with this technology so much that they will just not care about the trust, right?” That they'll be like, “Oh. Well, I don't know what they're doing with my data, but it's so convenient. I kind of make that sacrifice,” which strikes me as the approach a lot of big companies have made. You instead are trying to like, okay, trust first, and then importance and utility of the application second. Is that fair to characterize your approach that way?

[00:06:23] MV: Yes, that is a fair characterization. But, of course, we have chosen to go in the deep end of the cold pool. So not only do we do face analysis, which is people really understand that that is highly sensitive. We've also chosen to do that in areas such as mental health, ADHD, autism. We've chosen for people to do this in their own homes, so the data can't get any more private than this I think partly. That means that together with the fact that this is now fairly mature debate about the privacy of data, I don't think you can get away with first building a technology and then hoping that it's so convenient and so useful that people will start using it anyway.

[00:07:14] RS: Why would you want to, right? Why would you want to trade on that compromise, right?

[00:07:20] MV: Well, okay, there is a good reason, right? So there are definitely –

[00:07:24] RS: Is there a good ethical reason though?

[00:07:26] MV: There's not a good ethical reason. There's a very good business reason to actually build something first. I think the main reason there is data. The other reason there is product market fit and the likes and building technology and refining your features. Data is the real thing, right? So we sacrifice in the sense of abundance of data for trust, and that's a business decision that we've made at a high level, and we think that that is absolutely worth it.

[00:08:04] RS: That's a good example. You’ve mentioned now a couple. Data is processed not on the cloud, right? So data is processed locally. You’ve sacrificed on some data collection. What are some of the other ways that you go about developing trust?

[00:08:18] MV: We're transparent about where our data comes from. So we're very clear about that. We basically always allow people to remove their own data or to ask for what data we have. Like I said, we process everything on the device. Before you submit anything back, you get to review what data you have on a case-by-case basis, rather than an opt-in that you toggle once, and then forever you will be uploading your data. Those are the key things.

Another aspect that we do is sort of federated learning and federated data aggregation so that when one of our customers wants to have some information, for example, how engaging is a robot or how bad exactly is the mental health in this city or this hospital, they can see that type of data, but it's aggregated at the mobile devices to a dashboard or similar aggregation system back on servers.

[00:09:25] RS: You mentioned a moment ago that you make a sacrifice in terms of data. That's a business decision you've made in the interest of developing more trust. There's a conception that with less data comes less accurate insight. That is the real compromise you're making is that if you collect less data, then your technology won't be as good. Is that the case? Or have you found ways of generating really valuable insight, even with less data and even with on device processing?

[00:09:51] MV: Yeah, we have. In particular, we take one approach that really helps, and that is that we make our Insights into the actual human behavior, such as what somebody's level of depression is or how engaged they are or how fatigued they are. We do that not based on the raw video or the raw audio. But instead, we first recognize what you call behavior primitives, a term that we've coined. They're basically things like facial muscle actions, head pose, nodding and or shaking, or well-recognized nonverbal voice cues such as a cough or a pause or an um.

The benefit here is that that data, those behavior primitives is application agnostic. You basically can collect that from anywhere and then use that as the basis to then train your much lower dimensional behavior, understanding, let's say, your depression recognition, right? So now, you can basically train, let's say, your facial muscle action detectors, let's say, a database of like 10,000 people, right? But maybe you have only data for depression of 1,000 people.

The nice thing is that the dimensionality of the two problems is completely different. The dimensionality of the original, the facial muscle action detection problem, that requires the facial image, which is super high dimensional. But then once you're done with that, you're left with in our case about 100 dimensional descriptor, and that 100 dimensions then can be used to make the predictions in terms of depression. That means that you can do much more with much less data. It's sort of the inverse of the curse of dimensionality.

An additional benefit of using these intermediate behavior primitives is that they are just descriptions of facial muscles, like you – Somebody smiled or somebody frown or something like that. You don't have the image of the face anymore. So it's not a completely anonymized descriptor, as well. So you could choose, for example, to just pass on that little bit of information to a cloud surface. It will be completely detached from the original image data, so you absolutely can’t recognize somebody based on that information anymore.

[00:12:23] RS: What's interesting is that that's reflecting the way humans acquire data and the idea that the actual image has now disappeared, and all you have is the data. That's a memory. You're describing a memory. So that feels inherently more trustful, trustworthy. What do you mean when you say lower dimension?

[00:12:43] MV: In supervised learning, supervised machine learning, your examples are pairs of features on the one hand, which is, let's say, the pixels in an image and the label, the thing that you're trying to predict on the other hand. So the label could be whether somebody's smiling or not.

Now, if you take a picture, an input image of, what is it, 500 by 500 pixels or 200 by 200 pixels, if you've cropped the face out, you get like 40,000 pixels. You get 40,000 what you call features or dimensions. Based on those 40,000 inputs, you need to predict one output, whether somebody is smiling or not in order – So that's a very high dimensional problem, and there is something called the curse of dimensionality, which means that if you have a very large number of dimensions, then in order to make stable predictions, you also need a very large number of examples. A rule of thumb is that you want basically 10 times as many images as you have or examples as you have dimensions.

When we go from these pixel inputs to these facial muscle action detections or any of the other behavior primitives, we now have one prediction for each of the facial muscle actions. There are 32 of them. In the facts too, we add some other things to that like head pose and sometimes left and right parts of the face. So you end up with about 100 predictions. You've now taken a 40,000 dimensional image to 100 dimensional description of the same thing. Now, you can do your depression recognition or your fatigue recognition on that very low dimension. So you need far less data because sort of that tradeoff between the number of dimensions in your data and the number of examples that you need to still make a robust prediction has reduced so much.

[00:14:47] RS: So we've kind of teased at the application a little bit in the use case. So you have like depression recognition. It's being deployed in terms of mental health. Can you just kind of rattle off some of the use cases? So how might I use the technology you're deploying?

[00:15:00] MV: We are most actively working at the moment at perinatal mental health. So that's depression and anxiety in particular, just before and after birth of the mother. We have built an app that actually a mother can start using from quite early on in the pregnancy. The app is called Avocado. As you start using it, you can track lots of things about your pregnancy, which is useful in its own right. You can get a lot of information about your pregnancy. But more importantly, you can use it to sort of look after your mental health and wellbeing, checking in with a little virtual assistant.

This virtual assistant guides you through one or more interactive tasks. We have a task where you basically sing a lullaby to your unborn baby, the bump. Or you record a pregnancy diary. Of course, you have to speak and gesture as you do that. You express. We record all that and we let the user know that we are recording you, of course, very clearly. Then you get a score of your mood. That is currently in health and wellbeing solution, so it's not a medical device. We are also going through clinical trials in which we are giving this to pregnant women, while a doctor looks after them as well. So that's one application that we do.

We do pain estimation. We do it pretty well actually. We do a parent emotion. So we basically look at valence, arousal, and dominance, which are three types of emotion descriptors. Valence is how positive or negative you are. Arousal is how energetic you appear to be. Dominance is an interesting one for social robots and virtual assistants because dominance tells the system how much does the user feel in control. That's very nice if you work with a robot because you want the user to feel in control.

[00:17:14] RS: How does it measure dominance?

[00:17:16] MV: Like all our other techniques, it looks at these behavior primitives over time. It then learns a machine learning system, which makes use of a recurrent neural network over time, and a particular description of the behavior primitives, they are described in a way that we have tucked away in a nice little trade secret. But basically, it describes the way people express themselves at any moment in time. Then we apply a recurrent neural network on top of that, which then makes predictions over time on the dominance in real time. So that becomes a real time output of how high or how low the dominance is, and it's very much the same approach that we use for almost all our systems.

[00:18:04] RS: When you think about the ideal of this technology in your company in particular, what is the pie in the sky hope? When you prognosticate a little bit, what is your vision for the company, and how do you foresee this being deployed in the most meaningful, impactful way?

[00:18:20] MV: Our ultimate goal is for people to trust this technology to help them, so they are the happiest and healthiest they can be, particularly helping with those conditions and those situations where we know that this helps. So we know this helps for a lot of mental health conditions. We know that this would really, really help for early warnings for Parkinson's and other conditions. So we hope that this just becomes ingrained in everyday life, where people just regularly use a personal assistant.

Again, completely private, right? So nobody else sees their results. It's just basically we would like people to have that trust with that system that regularly checks in on them. I'm not sure whether it will become one system that everybody uses regularly or whether there's two or three one for every different major disease or something like that or for major condition. But, yeah, we imagine people to know that they are regularly testing themselves, basically. They have the power and the freedom to look after themselves.

[00:19:38] RS: Got it. Well. Michel, this has been fascinating learning from you today all about your platform and the approach towards trust. So at this point, I would just say thank you so much for being a part of the podcast. I really love learning from you today.

[00:19:49] MV: No problem. Thank you very much for having me.

[END OF INTERVIEW]

[00:19:54] RS: How AI Happens is brought to you by Sama. Sama provides accurate data for ambitious AI, specializing in image, video, and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, e-commerce, media, medtech, robotics, and agriculture. For more information, head to sama.com.

[END]