How AI Happens

Genetec Director of Video Analytics Florian Matusek

Episode Summary

Florian Matusek, Genetec’s Director of Video Analytics and the host of Video Analytics 101 on YouTube, explains the importance of maintaining security and privacy, new waves of special analytics, and why real-time improvements are more difficult than back-end adjustments.

Episode Notes

Genetec has been a software provider for the physical security industry for over 25 years, earning its spot as the world’s number one software provider in video management. We are pleased to be joined today by Florian Matusek, Genetec’s Director of Video Analytics and the host of Video Analytics 101 on YouTube. Florian explains how his company is driving innovation in the market and what his specific role is before divining into the importance of maintaining both security and privacy, this new wave of special analytics, and why real-time improvements are more difficult than back-end adjustments. Our guest then lists all the exciting things he is witnessing in the world of video analytics and what he hopes to see in re-identification and gait analysis in the future. We discuss synthetic data and whether it will ever be commoditized and close with an exploration of the probable future of grocery stores without any employees.

Key Points From This Episode:

A warm welcome to the Director of Video Analytics at Genetec, Florian Matusek.
How the Video Analytics 101 YouTube channel was formed.
The purpose of his YouTube channel and its ideal viewer.
What his company does and what his role entails.
How Genetec has transformed as a company from its inception until now.
The insights Florian hopes to provide to his customers through video analytics.
Genetec’s new technology that upholds both security and privacy.
Exploring the new wave of spatial analytics.
The difference between real-time improvements and gradual, back-end adjustments.
New use cases, techniques, and trends that Florian finds exciting.
The perks and problems of re-identification.
Whether the current technology of gait analysis is reliable and how it relates to re-identification.
How technology is evolving to include time-based data collection.
The difficulties he experiences in collecting video data to train his models.
Whether there’s an opportunity for synthetic data to augment his data strategy.
Florian’s thoughts on synthetic data becoming commoditized.
Some interesting ways that Genetec’s clients are using its technology.
The video analytics behind the automated drinks system at the Denver Broncos stadium.
How close we are to a future of grocery stores with no employees or cash registers.

Tweetables:

“Nowadays, it's about automation. It's about operational efficiency. It's about integrating video and access control, and license plate recognition, IoT sensors, all into one platform, and providing the user a single pane of glass.” — Florian Matusek [0:05:11]

“We will always build products that benefit our users, which is the security operators, the ones purchasing it. But at the same time, we see it as our responsibility to also do everything possible to protect the privacy of the citizens that our customers are recording.” — Florian Matusek [0:09:03]

“What gets me excited are solutions that are really targeted for a specific purpose and made perfect for this purpose.” — Florian Matusek [0:11:24]

“You need both synthetic data and real data in order to make the real applications work really well.” — Florian Matusek [0:21:42]

“It's really funny how customers come up with creative ways to solve their specific problems.” — Florian Matusek [0:26:36]

Links Mentioned in Today’s Episode:

Florian Matusek on LinkedIn

Video Analytics 101 on YouTube

Episode Transcription

[INTRODUCTION]

[00:00:03] RS: Welcome to How AI Happens, a podcast where experts explain their work at the cutting-edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson. We're about to learn, How AI Happens.

[INTERVIEW]

[00:00:31] RS: Here with me today on How AI Happens, is the Director of Video Analytics over at Genetec, as well as the host and star of the Video Analytics 101 YouTube channel, Florian Matusek. Florian, welcome to the podcast. How the heck are you today?

[00:00:45] FM: I'm doing great. Thank you for having me.

[00:00:47] RS: I'm thrilled to have you, because I don't think there's really enough technical content out there specifically on video analytics, and really getting deep in the weeds. There's a lot of really flashy things about what you can do with some machine learning, with some algorithms on top of highly produced video, but I really want to get into some of the nitty gritty here. You were just a man for that.

I want to learn about the company Genetec that you're at and your role. First, though, I have loved, in researching this for this episode, I've loved your YouTube channel, Video Analytics 101. I love how technical it is. Would you mind sharing a little bit about the genesis of that channel? What it is you cover over there?

[00:01:22] FM: Yeah, sure. It all started at the beginning of COVID, where more and more people went online, and were consuming more digital content. When we looked at what's out there, what's in YouTube, we found out that everything is promotional in our space, in the security space. All the manufacturers have flashy webinars and product sheets and so on, but there is no educational content that really focuses on the security industry.

There's a lot about AI and machine learning in general, but really nothing educating people on video analytics in security, how stuff works. How does deep learning work? How does machine learning work, in security, in our video analytics? We found there's nothing there. That's why I started the channel. It's been now roughly one and a half years and has a great following, and is just growing and growing. It's really exciting.

[00:02:08] RS: Who would you say is the channel for? Who is like your ideal viewer who would get the most out of that content?

The ideal viewer is technically interested, but not necessarily a developer or somebody who is actually sitting down and coding, although it's interesting for them as well, but we try to make the content as easily consumable as possible, but still get into the nitty-gritty of the technical details. Somebody who is technically interested, interested in how it works in our industry, this might be what we call system integrators, or end users customers who are really interested in how stuff works under the hood, but not necessarily our developers, but we want to know how it works. This is really what it's for.

[00:02:50] RS: Sounds familiar. Thank you for sharing that with me. That was a selfish question as a fellow content creator, just to learn a little bit about your positioning and all that, but the channel was fantastic. I definitely recommend it. We'll make sure to put the link to the show notes. I'm hoping that if this goes well enough, this video will wind up on your channel, as well.

[00:03:07] FM: Absolutely.

[00:03:08] RS: Fingers crossed that I'm up to snuff to make the channel. Anyway, I want to hear more about your company and Genetec. Would you mind sharing a little bit about what the company does? Then we can get into your background and role as well?

[00:03:21] FM: Yeah, sure. Genetec is a software provider for the physical security industry. When we talk about physical security, we mean, the video management. You see all the video surveillance cameras out there. We mean access control. We mean license plate recognition. The company is now 25 years old. It was really founded just when there was the first change from analog cameras, surveillance cameras, typically CCTV cameras, to IP cameras like when it became network. Genetec was one of the first software providers there, and grew, and grew, and grew.

Today, we can really proudly say that we are the largest software provider in video management globally. What's even more important for us is that we try to disrupt the market even after 25 years and drive innovation and bring innovation to the market. In the end, our reason for existing is really to make the lives of the operator that is sitting in front of the video screens easier, to make it easier for them to provide the security, to manage security, and to do the thing in an ethical, moral, and privacy-protecting way. This is what we're really all about.

[00:04:26] RS: I'm always delighted when companies like Genetec, who are doing really exciting stuff in the space, have been around for a while. In your case 25 years. What was it like at the beginning? What were the use cases for Genetec? Who were you then and what are you doing now?

[00:04:41] FM: In the beginning, nowadays, we would call it simple, but the challenge back then was really to get an IP signal. Record it. Then viewed on the video wall, and that sounds simple, but it's not so simple to do this in a secure way, to store the video so it cannot be manipulated. So, to make sure that it's transferred properly from the camera to a server to retrieve it in a fast fashion and to replay multiple videos at the same time.

Of course, nowadays, we grew way beyond that. Nowadays, it's about automation. It's about operational efficiency. It's about integrating video and access control and license plate recognition, many other sensors, IoT sensors, all into one platform and providing the user a single pane of glass. So we went from a very simple beginning of just recording and viewing video to a very sophisticated platform that really manages all kinds of security data that's out there.

[00:05:34] RS: 25 years ago, when a camera was the size of a rocket launcher, and it was a non-trivial thing to store video data, right, providing a wall of video feed like a wall of content for some kind of security individual, which sounds like the use case, that was a non-trivial thing like I said. So now we've come a long way. What are some of the video analytics you're layering on top of it? What are some of those, in addition to presenting the material in a very clean way? What are some of the insights you are looking to provide folks?

[00:06:03] FM: It's interesting, my personal background is video analytics. We'll get into this a little bit later. There have been several waves of video analytics in the last, I would say 15 years, where there's always a lot of excitement. Then there is the trough of disillusionment like a gardener. The same thing happens with video analytics, so we have to be very careful at what is out there, what is being promised, and what is actually usable and can be, can provide an actual benefit to the user.

One of the things that we are doing is we are focusing on protecting people's privacy with video analytics by analyzing the video, and then pixelating people in the video automatically, but everything else is clear. If something happens, then you unlock this video based on permissions. This is a very unusual video analytics application, but it's so useful, and it works in practice and every single camera that you put out there. For the first time you have a compromise or really a synergy between security and privacy. It's not a zero sum game anymore. This is a very unique application that actually works and is being deployed in the field.

Going back to maybe more classical video analytics applications, what is also very useful is anything that goes around occupancy of buildings, of buses, of trains, of train stations. So knowing how many people are in the space is becoming increasingly important and interesting. Some manufacturers called a spatial analytics. To know where people are moving, you can think about a trading platform, for example, where if you know that everybody's gathering at one part of the platform, you can automatically trigger an announcement and say, “Please go to the back of the platform to really have an operational benefit to the whole thing.”

This is really what it's all about in video analytics. It's not about having the next flashy thing and the next model that can even detect more accurately. It's really about use cases and applications that provide real world benefits. Of course, it needs to work, but if it's now 98, or 99% accurate is less important than actually providing an actual use to the user.

[00:08:07] RS: That has an interesting balance to strike, because the example of the train station, I can see how that would immediately impact me, as you said, a user. A user of the product, which is the train station that is using this technology enabled through Genetec, right?

[00:08:21] FM: Yeah.

[00:08:22] RS: The other side of that, though, is what are the analytics you just give to the product developer, the people who are in charge of making the train station more efficient. They might see you would give them insights to increase their marketing or to improve their security or what have you. What is the line between something that can be fashioned in real time to help me versus something that's provided on the back end to the customer to make slight adjustments and tweaks over time.

[00:08:50] FM: Well, in the end, it's up to the customer, because the customer is the one spending the money on the system, but there is an additional aspect to it. That's what I mentioned before around protecting privacy that goes beyond this, because yes, we will always build products that benefit our users, which is the security operator are the only ones purchasing it, but at the same time, we see it as our responsibility to also do everything possible to protect the privacy of the citizens that our customers are using to record. This goes beyond it.

There might be some aspects where you cannot attach a commercial value to it, but we still want to do it. We still want to make sure we use encryption to make sure it's not – cannot be hacked. We want to make sure we are protecting people's privacy. We want to make sure that there are different permission levels to access the data. This is more that goes beyond the interest of the actual user. It goes right into the interest of us as a society and the classical citizen.

Coming back to your example, it usually, the benefits that a user sees or a customer sees really overlap with many benefits that the actual citizen sees, like the train stations, it's a great example. Well of course there's something to go beyond, one example is, one of the video analytics you can apply is something that informs you if the cameras are being dirty or covered or turned away. You as citizen don't have a direct impact to this, but it has a huge impact on operations, because the customer can make sure that cameras are actually recording.

Situations do not happen where something happened, you want to look at the recording, and you find out exactly this camera that was turned away, it was covered. This is something that's really about maintenance and operations that benefits our customers, where citizens might not see the benefit right away, but in the end, of course, they benefit as well.

[00:10:37] RS: Got it. Because Genetec has different kinds of customers, I'm sure you see lots of use cases and just generally, as installed in the industry you are, you're probably keeping tabs on what's happening in the space. What are some new use cases or techniques, trends in video analytics that you find particularly exciting?

[00:10:54] FM: Interestingly enough, the things I find exciting might not be the exciting things, because in video analytics or machine learning in general, you can always come up with the next cool thing, you can come up with a super resolution or generative networks or text to image or these kinds of things.

These are all great, but if they don't help you, if you cannot deploy them at scale, if they don't run at scale, if they don't run all the time. It doesn't help you if it works in 50% of the time. They have to run 100% of the time. So what gets me excited are solutions that are really targeted for a specific purpose and made perfect for this purpose.

Occupancy is a great example where you can really hone in on this and make it work for the customer that works 100%, but maybe it's a solution that works only for train stations. It doesn't work for stadiums, but it works perfectly for train stations. This is what really excites me and really gets the benefit of video analytics into the hands of the users.

Now, if you want to hear a little bit more about technology, of course, there are exciting things happening, as well. I would say one of the big things that are happening right now in video analytics, and security in general, is a person re-identification, because we all know facial recognition. It's super controversial, depending on which country you go, but definitely in the US, definitely in Europe. It's typically being used to identify the same person across cameras. This is not being deployed so much anymore, as it used to be, because of all the legal hurdles.

A big trend right now is re-identification of people without using the face. You want to use the clothes that they're wearing, maybe the way they walk and maybe that's a re-identification that only works right now, because as soon as you change clothes, you are not being real identified anymore, thus protecting your privacy, but you can do all this great use cases, by knowing that the same person appears in different cameras. You can think about a search, if you find a person of interest. You can say, “Where has this person been before?” It gives you all the results where this person has been. You can tie this to your access control systems and say, “Okay, this person badged in here with the access control. Now this person is over there.”

You can use this to learn the layouts of cameras. So there are a bunch of cool applications using the re-id that were not possible before and where you do not have to use facial recognition, which extremely intrudes into people's privacy. This is pretty cool. If we go beyond this, but this is still a few years out, what's up and coming is gait analysis, to analyze how people walk, because this is really also a biometric indicator that would be used like facial recognition, without using the face as well, but just the way people walk.

[00:13:34] RS: How reliable is gait analysis at this moment?

[00:13:36] FM: Right now, it's not very reliable, honestly. You need a very high resolution to do it. You can think about typical surveillance cameras, a typical resolution is 1080p. You don't have 4k, you don't have anything beyond. You have to imagine that a typical view of the camera just looks at the street, people are super far away, they're small. You might have a resolution of a person of maybe 30 pixels height, which is really not a lot. It's definitely not a lot for gait recognition. I would say it's really in its infancy. It's not accurate enough today to be usable.

[00:14:11] RS: Yeah. 30 pixels is nothing, that's like Super Mario, basically.

[00:14:14] FM: Yeah.

[00:14:14] RS: Who I will point out had a very recognizable gait. In any case, that is really interesting. It's just an anonymized way to – sorry, a non-facial way to identify someone. Doesn't that run into the same anonymization problem, though? Even if you obfuscate someone's face, but they're still identifiable via their gait, don't you run into some of the same challenges with privacy and security?

[00:14:36] FM: Yes, absolutely. This is also the differentiation between gait analysis and the re-identification, I mentioned before. Re-identification really works only by the way people dress. So as soon as you change clothes, you don't have this issue. While gait analysis is really a biometric identification, which absolutely will be treated the same way as facial recognition. Yes, absolutely, agree.

[00:14:57] RS: The idea is that in anonymization, you are still gleaning the insight, because you know this is the same group of data points. This is the same person, as opposed to, “Oh we're doing this so we can identify you, Rob walking through the stadium. We're identifying you as a recurring sprite, essentially, so that we can glean insight from you as a – something that re-appears, re-identifies.”

[00:15:21] FM: Yeah. Yeah. If we go back to the train station, one interesting example of re-identification would be, you see a person entering the train station. Then you see the same person, all the different cameras. This way you can measure how long does it take for a person to go from A to B, which actually is in very real and very valuable information for railway operators, so that they do want to notice, they want to know how long paths taken in railway station.

You could do this by identifying the same person and then measuring how long does it take them to get through the train station. The next day, when they're coming back, you have no way of knowing that it's the same person. We don't care that it's Rob. We don't care if this person is coming back, but we can still use it to track them through the railway station, which is a more anonymous way than for example, using cell phone data. Using the MAC address that you Wi-Fi might send out, because we have no way of knowing it's the same person as soon as they change clothes.

[00:16:17] RS: Yeah. That makes sense that you would want to be able to track usage over a period of time. Say I go to the train station every single day or five days a week to go to work. You want to know what time does he arrive? Does he take the escalator? Does he walk up the escalator or a standstill on it? How long does it take him to get from the ticket turnstile to the platform? All these things that you would want. Lots of data over a long period of time.

[00:16:40] FM: Yeah, just one more example that we all know from airport. If we if we stand in the security line tells us five minutes and this is exactly how this is being done. It knows that this specific person is standing in line for five minutes. We can estimate the next person will also stand for five minutes. It's a very real example that we all experience and know that has a lot of benefit to us.

[00:17:01] RS: Yeah. I experienced that two days ago. I just took it for granted. I was like, “That's probably just a guess, based on how many people are in the building.” Of course, there's no one at the doors of the airport with the little chrome clicker thing.

[00:17:14] FM: Yeah.

[00:17:15] RS: I’m figuring that out. Of course, it's all automated and served up to someone. Yeah, definitely affects my life. The notion of me as a data sprite, over time leads me into this question of the greater question about the challenges with data. I mean, this comes up a lot. This podcast could probably be called how data happens, or maybe how data doesn't happen, but we wish it happened.

I imagined for you, it's even more uniquely difficult, because video analytics is sensitive to get your hands on it. It's also happening, like you said, in varying resolutions at one angle, maybe. What are some of the challenges associated with getting data to train your models?

[00:17:53] FM: Our industry is very interesting, because, as you mentioned, it's very specific surveillance camera footage, it's very specific. Most of the datasets that are out there are not made for this. We all know ImageNet, it's basically random photos. I'm sure when Facebook is trading for Facebook photos, they don't use surveillance footage, but they use people's photos same for Google Photos or Apple photos. The main applications for video and image recognition are not surveillance videos, which also means that all the datasets available independently of the issues of licensing, which is a separate issue, are not made for surveillance footage.

The only surveillance footage that you really get is incidentally, Chinese, because apparently they don't care so much about the privacy when they create datasets, but in the end, you want something broader, you cannot only train on Chinese datasets, obviously. This is definitely a challenge to find the right datasets that fit your use case.

Maybe our industry is also interesting, because if you want to create, for example, the thing with the railway stations that we mentioned. You don't only want it to work for railway stations in the US. It has to work in the US. It has to work in Europe. It has to work in India. It has to work in Dubai, and it has to work in Tokyo. So you need this variability a lot. Once it's out there, it has to work everywhere. This is extremely difficult. Obviously, all providers try to do their own thing and create datasets, but it's a challenge for everyone.

[00:19:17] RS: In the case of getting loads of data from China, I can see how that would be it would be appealing, but it's not necessarily variable, as you say. Could it be applied across every use case, just there are all these cultural differences associated with it. I'm imagining to go back to the train station example. In Japan, they have these employees with white gloves on that they like they gently push people to get more people onto the train. That's accepted. It's a cultural thing that they just understand and accept.

If you try to do that in the New York City, that person will get absolutely knocked out, right? There's no way you could do that, but that would come through in the end. In the same way with the data from China, there are these cultural differences to help people utilize public spaces that you would have to account for. It may not be one to one applicable to other places. Is there an opportunity here for synthetic data to augment your data strategy?

[00:20:10] FM: Well, definitely, there's a hope that synthetic data could help. Currently, it's there's hype around it. It's sometimes seen as a panacea to the data problem. To also solve the licensing issue, but I would say we are very much in the beginning of synthetic data, especially for our security industry.

If you look at data, synthetic data providers all what you see everywhere they're all focused on creating digital twins, digital twin cities, digital twin factories, or the automotive space. There's not a lot that really looks at the general surveillance, because the challenges I mentioned, there is really the variability, sorry. So what you would rather need is something super sophisticated that has so many different environments as Grand Theft Auto, for example.

This doesn't exist yet in a way that's really properly done. All the companies in our space that I've been talking to are really creating their own environments for synthetic data in order to augment their own datasets. So far, results have been mixed. Definitely, it provides a value there.

Everybody also agrees you cannot do it with synthetic data alone. It can only be part of the mix. It can only augment it, but you still need real data. Of course, there are different approaches. You can take a synthetic data and then use generative networks to make it look more real, which also helps, but it still doesn't replace real data. In summary, definitely, there is a hope that it will help. Realistically, likely it will still be both. You need synthetic data and real data in order to make the real applications work real well.

[00:21:46] RS: Yeah. It seems like the powerful synthetic data environments are happening, like you said, at larger companies. NVIDIA I know has won unity. It’s making a big bet on it. I sat down with their SVP of AI to discuss that. Even he conceded that it will never just be synthetic data that it will always have to be based on and compared to real world data to ensure that it has verisimilitude and that it is reliable.

Being as it is now where the synthetic data environments are the purview of large companies, do you expect that that will become commoditized, that will become more accessible to smaller companies and maybe even individual users who can't afford to pay unity a couple million bucks a year to get access to their synthetic data environment?

[00:22:28] FM: Absolutely, it will get commoditized. We're not there yet. Not even close, but definitely we're getting there. We saw something similar with general datasets, although with general datasets, the problem that I see is that few people are looking at the actual licensing. Even if you look at ImageNet, what's being licensed and available are the labels, not the underlying image data, which has totally different licensing for each individual image. So what's happening in this industry, especially all the companies including small companies, because everybody has access to this public datasets, people are using this, and accepting that this is a legal grey area.

We are trying to avoid this as much as possible. We are really trying to create our own datasets, but what I see in a larger industry is that this is pretty much being ignored. I think this is definitely a threat. I hope that this will also go away with synthetic datasets, as soon as synthetic data gets more commoditized, but you don't have this complex issue of licensing anymore, because it's very clear who created the data.

Well, with existing datasets you have different people. The one the person who took the photograph, the person that's on the photograph, the person that labeled the photograph, and then the platform that provides it. There's so many different players providing this data, so it becomes very fuzzy or opaque. While with synthetic data, it should be clear. I hope as soon as this commoditized it will make synthetic data legally available, really to everyone, including smaller companies, but we're still a few years away from that.

[00:23:58] RS: Yeah, it makes sense. I want to learn a little bit more about some of the customers using Genetec. Obviously, you don't have to name names. I'm sure they might not appreciate that, but what are some ways that you've found folks using the technology that maybe you didn't expect or gleaning new kinds of insights? I feel like once you put some powerful tech in the hands of users, it gets used in all these interesting ways. What's happening out there?

[00:24:20] FM: Well, it's interesting, yeah, because obviously we provide products that for a certain use case, but as you're saying, increasingly, we see users try to use in a very creative way. One of these ways is combining different events from video analytics in order to get to a certain outcome. Going back again to trains just, because we have been there.

One example is one of our customers is using our analytics to detect when people go on the railway tracks, because obviously that's a safety hazard. Interesting enough, that's a challenge, because when they applied our analytics, they found out they get alerts every time a train comes in and people enter the train, because they're going to the same day area just, there’s a train standing there.

What they came up with is they used our other analytics to detect that a train is coming in and created a rule like if a train is coming in, then switch off the video analytics to detect people going into tracks and if the train goes out activated again. This works very well. It's a very surprising outcome that we haven't thought about before. It makes a lot of sense in their specific use case. The same way we see we see many others.

We had one of the customers who their problem was oil spills. They had a, I believe it was a pond. They wanted to know if there is oil or other toxic materials spilling. They've had a very unique look, but obviously, it wasn't, it didn't look all the time the same, but they applied our analytics in order to automatically detect when there is some floating film on top of the water, and to trigger the alert.

Again, something completely surprising that we had no idea that this is possible, or that there's a need, maybe.

The third example is what another customer where you have delivery trucks going into a facility they're being weighed when they go in. They have to go on a big scale in order to know how much they weighed, but what increasingly happened is that these trucks didn't stop the right spot, they went a little bit above, too far, or not far enough. So that the full truck was on the scale, and they got the wrong weights.

They applied analytics in order to check that the truck is completely on the platform and didn't go too far. Again, it was a standard analytic module of ours. It was not meant to be used in this way, but it's still working to use it in this way. It's really funny how customers come up with creative ways to solve their specific problems.

[00:26:41] RS: Yeah, absolutely. Before I let you go. There's a specific problem I want to ask you about just to satisfy my own curiosity, which is that recently, I went to a Denver Broncos game. I don't know if you follow American football, but the most interesting thing that happened at that game to me was that inside the stadium, there are these sections where you can buy drinks, and you just take it out of the cooler, and then you walk out and then you pay as you walk out. There's no attendant. There's no scanning on like a barcode or anything, you pay and you walk out, basically right?

Can you explain to me what's happening there? Where's the computer vision and video analytics playing a part to make sure that I pay for the things I'm taking in exactly the things I'm taking and only the things I'm thinking.

[00:27:24] FM: Okay, well, I haven't seen the one in Denver. In general, that's the same thing that Amazon Go in their stores are doing, right? You go in, you take something, there are no cash registers anymore. If you look at how Amazon is building this up, it's super sophisticated. You would be surprised how many cameras you need in order to pull this off, because just putting a standard camera there and detecting what's in the shelf what a person is doing and taking it out. It's not enough, because you have so many inclusions. It's very hard to see. You have to account for situations where a person takes it and puts in the cart and puts it back again. Yes, it's possible, but it's currently it's only possible with a big effort in terms of camera hardware, and server hardware behind it.

Obviously, what you're doing is you're training on specific products. You detect that there is a product. You count how many products they are. You detected as a person. You analyze the movement of this person, by reaching out to the shelf, taking it out, counting that you're less there. Sometimes what they also have, if you have a cart. You have a separate camera on the cart that again, detects that the item is now in the cart and knows what you're actually buying. Yes, there's all this computer vision stuff around it. Maybe in combination with other stuff like RFID, but today, while it's possible, it's only possible with the very big technological effort.

[00:28:43] RS: Yeah. In the football stadium example, I'm trying to remember maybe we had to put it down on a platform and then pay. So it's like scanning in the platform. It's not like –

[00:28:52] FM: That’s easier.

[00:28:52] RS: Pay as say, it just says endlessly complicated. It's like, okay, I take it off of the shelf, and I put it back on a different shelf, right? Or say there's a camera in my cart, but I have the same item stacked up a couple times. I have with other things all around it. Can I only see the thing on the top? I don't know.

Do you suspect that use case will prove valuable that we will remove cash registers and grocery store employees?

[00:29:15] FM: I think there's a big value in retail in order to automate this, because you want to improve operations. You want to avoid queues. It's frustrating for everyone. It's inefficient. There are many ways in order to improve this. Yes, we already know self-checkout, where you have to scan yourself. This is a way to improve this. I definitely think we're going in this direction. I think we're far away from becoming this becoming commoditized. I think stores like Amazon Go. They also just account for a certain loss.

I mean, I guess it's the same as when you just return your item to Amazon and they just throw it away. It's just priced in into the margin. I think the same way to do in their store, so if they just make a mistake, it's fine for them as soon as – as long as the customer is happy. I think for them, it works, but it will be some time before we would really see that this in Walmart or any large deployment.

[00:30:08] RS: Makes sense. Florian, this has been a fantastic conversation. I've been really fascinated by the work you're doing and how you've explained all this to me. As we creep up on optimal podcast length here. I'll just say, thank you so much for being with me on the show. I've loved learning from you today.

[00:30:19] FM: Well, thank you. That's been super fun.

[OUTRO]

[00:30:23] RS: How AI Happens is brought to you by Sama. Sama provides accurate data for ambitious AI, specializing in image, video and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, e-commerce, media, MedTech, robotics and agriculture. For more information, head to sama.com.

[END]