How AI Happens

Neural Rendering with fxguide Co-Founder Dr. Mike Seymour

Episode Summary

The movie industry is not immune to the innovations of AI and machine learning, with many different technologies being developed for application in production and film. In particular, is the role of AI in inferring ‘Digital Humans’ for post-movie production processes. In this conversation, we hear all about this amazing new technology from special guest Dr. Mike Seymour, cofounder and contributing editor at fxguide, an Emmy-nominated veteran of the creative industry, writer, and podcaster.

Episode Notes

 Dr. Seymour aims to take cutting-edge technology and apply it to the special effects industry, such as with the new AI platform, PLATO. He is also a lecturer at the University of Sydney and works as a consultant within the special effects industry. He is an internationally respected researcher and expert in Digital Humans and virtual production, and his experience in both visual effects and pure maths makes him perfect for AI-based visual effects. In our conversation we find out more about Dr. Seymour’s professional career journey, and what he enjoys the most about working as both a researcher and practitioner. We then get into all the details about AI in special effects as we learn about Digital Humans, the new PLATO platform, why AI dubbing is better, the biggest challenges facing the application of AI in special effects.

Key Points From This Episode:

Tweetables:

“In the film, half the actors are the original actors come back to just re-voice themselves, half aren’t. In the film hopefully, when you watch it, it’s indistinguishable that it wasn’t actually filmed in English. — @mikeseymour [0:10:15]

“In our process, it doesn’t apply because if you were saying in four words what I’d said in three, it would just match. We don’t have to match the timing, we don’t have to match the lip movement or jaw movement, it all gets fixed.” — @mikeseymour [0:15:15]

“My attitude is, it’s all very well for us to get this working in the lab, but it has to work in the real world.” — @mikeseymour [0:19:56]

Links Mentioned in Today’s Episode:

Dr. Mike Seymour on LinkedIn

Dr. Mike Seymour on Twitter

Dr. Mike Seymour on Google Scholar 

University of Sydney

fxguide

Dr. Paul Debevec

Pixar

Darryl Marks on LinkedIn

Adapt Entertainment

PLATO Demonstration Link

The Champion

Pinscreen

Respeecher

Rob Stevenson on LinkedIn

Rob Stevenson on Twitter

Sama

Episode Transcription

 

[0:00:04.5] RS: Welcome to How AI Happens, a podcast where experts explain their work at the cutting-edge of artificial intelligence. You’ll hear from AI researchers, data scientists and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they’re facing along the way. I’m your host, Rob Stevenson, and we’re about to learn How AI Happens.

[0:00:31.8] RS: Here with me today on How AI Happens is a lecturer at the University of Sydney, cofounder and contributing editor at fxguide, an Emmy-nominated veteran of the creative industry, writer, techno futurist, podcaster and I have to say, general delight, Dr. Mike Seymour. Mike, welcome to the podcast, how are you today?

[0:00:51.4] MS: Good, Rob, that’s such a nice introduction, I’ll have to use it on my LinkedIn profile.

[0:00:55.0] RS: Oh yeah, that one’s free but the second one, I have to bill at an hourly rate but I’ll let you get away with that one. Yeah I mean, there’s so many different ways we could go in this conversation just based on that bio alone but I guess to begin with, would you mind sharing just a little bit about your background and what you’re working on right now, just in case I didn’t do it justice in the bio?

[0:01:15.5] MS: Sure, no, thanks. So yeah, I’m from the film industry as you said, did many years in visual effects. I actually came up as a compositor but along the way, I’d always had a strong academic connection, I’d done various degrees and I culminated in doing my PhD on Digital Humans.

At that point, thought it was a really good idea to take a lot of tech that we were seeing in the film industry and see how we could research it further into work that was both relevant to what we like to refer to as M&E, yeah, media entertainment but also outside that. So yeah, so I have a lab where I do research at Sydney University and also do a consulting and so they call me an engaged researcher, which means we do a lot of work with that industry and I guess, I’m most well-known for co-founding fxguide, which has been academia and practitioners, now gosh, for more years a than I can remember, I think going back to 1999.

[0:02:08.7] RS: That bridge between researchers and practitioners is a well trod one. It’s common, I meet people who kind of have one foot in each camp, why do you think that is particular to this space?

[0:02:19.6] MS: I mean, it’s a very rewarding space to be because when I used to be a compositor in an expensive suite, charged out by the hour, we’d have clients come in and they’d be really enthusiastic to be doing work with us but they’d be like, “Hey, we want to do something really original and new and like, what have you, you know, saying that’s like, well maybe like, you could explain like, I’ve seen this thing and like, how does that work?” 

We’d all sit there quietly like nodding but privately saying to ourselves, “How the heck would we know? We’re never allowed out of the room.” I mean, we charged out at a billing rate that means, they don’t even want us to leave to have lunch. So, it was an imperative back then to find a way to help those senior industry folk keep current and understand what’s going on because of course, you expect somebody in that position to understand behind the tech, what’s actually happening.

But that takes time to do so that’s how that started and then, these days, I’ve got to say, it’s incredibly rewarding because I love doing research, I also like teaching and so it’s a great position to be in but for the teaching and also for the research, it’s great to be still doing work in industry because then you know that your research is relevant and hopefully, helping people and students love having that perspective of the real world, not just a kind of as I say, an academic point of view but actually being able to speak from experience.

[0:03:35.8] RS: Yeah, really, really important. When I think back to my own education, the individuals teaching me had not been in the industry that they were teaching it for some time. And it didn’t stick out to me as a 19-year-old, an undergrad but it does now, right? When I graduated, my experience was very different than what they were telling me because of course it was, they hadn’t actually done the job in a long time so I imagine that gives you an area of credibility. What are you lecturing on at the moment?

[0:03:59.8] MS: So, I do a lot of work around what we’d call digital disruption so like, that’s how new processes happen. Obviously, it’s in a tech space and then also, project type work because obviously being in the VFX industry for so long, it was just you know, years, if not decades of just project after project after project and you know, media entertainment is a huge industry. 

So at the undergraduate level, that’s what I’m basically teaching and then my research is very specifically on, as I say, Digital Humans. So either, CG or neuro rendering, call it machine learning or AI, we call it neuro rendering, inferring Digital Humans as supposed to traditional CG approaches.

[0:04:40.0] RS: Can you show what you mean by inferring Digital Humans?

[0:04:43.7] MS: Sure, I mean, as I’m sure, your listeners will know because you know, because you know, this is a really good engaged podcast, instead of doing a modeling, texturing, lighting and rendering process to produce an output, we infer from a statistical inference based off a GAN or other machine learning techniques, the final output, which is a very different process than has traditionally been the case in visual effects. 

So it’s interesting, like, if you actually go back about 25 years almost exactly, there was the first image-based modeling, image-based lighting approaches coming out of Berkeley and then USC with work in particularly by Dr. Paul Debevec. That was what fed into the Matrix and all the stuff that was done in the Matrix films, this idea of sampling the real world but since then, there’s been these two kinds of concurrent paths. 

One where you try and solve the lighting equation and come up with a really good kind of Monte Carlo sampling, ray tracing kind of thing, which is all a mathematical simulation of the real world and try to model light. 

The dominant model if you like for Pixar, do great work, terrific and then, there’s this other world where it started out as I say, with image based lighting and sampling in the real world since maybe past three or four years in particular in the film industry, suddenly embraced machine learning as a way of continuing that tradition although from a whole new perspective which is, “Hey, rather than me, try and model this thing, let me get the computer to get training data and come up with an inferred solution.”

[0:06:14.5] RS: So the inferred solution in this case is a platform, maybe is a bad word, technology, referred to as PLATO and I’m going to put a link in the episode description to just demonstration in a Polish film called, The Champion where PLATO was used to auto render between English and German and it’s really – it really blew my mind when I saw it for the first time, it’s fantastic.

I would encourage people to check that out just so they get a little bit of a context of what we’re speaking about here but I’m curious if we can get into the development process for PLATO and if you could speak a little bit more on where that came from and how it was used in The Champion?

[0:06:53.1] MS: Yeah, absolutely. So, Daryl Marks is a visionary producer who set up Adapt, that’s how I came to be involved in the project. He approached me as a kind off a digital human expert to say, “Hey” in Daryl’s words, “he’s always wanted to make foreign films more accessible to an English-speaking audience and the problem is that while we can do subtitles and while we can do dubs, neither of those two things really address the issue that you can be sort of taken out of the film if you’re trying to just appreciate the acting performances.”

Now of course, we don’t want to replace subtitles because the hearing impaired community desperately, obviously want subtitles and quite frankly, if you’re learning another language, it’s a great thing to do but in parallel to that option of having subtitles, wouldn’t it be nice if I could actually see the film delivered in my native language? So Daryl set up a company and a whole team that I’m part of to go down that path. 

Now, that path is basically saying, “Instead of dubbing the film where we just change the audio but the vision remains the same and we kind of hope that the lip sync doesn’t look too bad and it doesn’t look too hokey or embarrassing, what if we actually change the audio to English and/or the mouths so they’re also speaking English?”

So as you correctly pointed out, the film, The Champion, Polish film about a member of the Polish community who was very early into Auschwitz in the 2nd World War but this character who is based on real life incidents would happen, Teddy, was a boxer and so he boxed in Auschwitz. So in that film, which is incredibly moving and really great dramatic performances, what we wanted to do is, convert the whole film. 

So every character, be that a German guard or an inmate in Auschwitz was speaking in English so that that film would find a wider audience. Now of course, we had to record the audio of them saying in English but we also had to get, for every face, in every shot that was speaking, we had to do an inferred or machine learning solution so that, let’s – if I can, Rob, you don’t mind using you as an example. Imagine you’re in the film, you’re this academy award winning actor and I should say, the actors in this film were really good. I mean, just outstanding. 

So Rob, you’re giving this terrific performance but unfortunately, you don’t speak English and I do, so we would film maybe just in a normal sound booth and without dots on my faces, without head mounted cameras, just deliver the lines like I would in a normal sound booth with pretty standard lighting and a machine would study my face, would study your performance and then it would change your performance to mouthing the words the way that I said them and of course then, my voice would be effectively coming out of your mouth.

Though absolutely, that’s how it worked but in the case of The Champion, we also wanted to be respectful to the actors and because like, for example, Teddy, the lead actor, he’s just a spectacular actor. He’s kind of like the, I don’t know, the De Niro of his Polish kind of community. He’s really outstanding, very, very respected actor.

[0:09:48.2] RS: Peter Glowacki, I believe is his name.

[0:09:50.4] MS: Exactly and half that cast, including him in that lead role came back to re-record their own lines in English. So the process doesn’t require the same actor to deliver their own same lines but of course, Rob, if you were able to speak English as well as German, it would be common sense to get you to come back and do that because hey, you made those acting choices and we’d welcome your input again.

So yeah, in the film, half the actors are the original actors come back to just re-voice themselves, half aren’t. In the film, hopefully, when you watch it, it’s indistinguishable that it wasn’t actually filmed in English. So for a really production-based budget schedule and approach, in other words, it’s not as a sort of a test, not as a demo, not as a proof of concept, we converted the entire The Champion from Polish to English.

[0:10:42.8] RS: So in this case, it was the original actors performing in a digital language and then using that for the auto-dubbed version, is that correct?

[0:10:52.9] MS: Yeah, so there were like, I think 20 speaking parts, 10 actors that gave those re-performances were from the original cast and 10 actors weren’t and they range from kids to women, men. They even had Teddy with heavy prosthetic makeup because he’s a boxer and so we have scenes where his face is incredibly bruised and beaten up and swollen and of course, we never did anything to the same actor doing that in the sound booth. 

He just came back that day normally, we didn’t apply anything special to him but it very convincingly, produced the beaten-up Teddy performance, now, in English as it had done previously with all the other cast.

[0:11:32.2] RS: With the goal, medium to long-term be that the actor wouldn’t need to come and perform in another language? That it could be the way you put it where I’m the actor but you were able to provide the English version?

[0:11:44.2] MS: Yeah, I mean, there are a couple of considerations here. At this stage, it matters of course that we would believe that the voice that is coming out of your mouth is plausibly your kind of age, your kind of — if I was a heavy set individual, my voice would normally sound differently than if I was a cyclist or an athlete that wasn’t heavy set. So it’s not a machine learning problem, it’s just a sort of a plausible voice problem, you know? 

So if I were for example, Japanese with a Japanese accent and I was trying to voice in English, I would still have my accent and that would be, maybe odd. So yeah, you want to cast somebody that is plausible for the voice. You can cast anyone you like but of course, if you can use the original actors, why wouldn’t you? Because they understand their roles very well.

[0:12:31.7] RS: Of course, yeah and as you said, they make acting choices, right? So they would want those choices to be consistent across various markets, right? Thy would want their decisions to be the same in whichever way they were presenting, whichever language they were presenting the film which interestingly, this process, I don’t believe is necessarily a snag because ADR which I forget what it stands for but it’s actors kind of after they have filmed, go and rerecord dialogues so that it sounds better if onset mics didn’t pick it up.

This is an existing process that actors go through and that is common in the film industry. So my point is, I don’t know if we’re – if someone were trying to critique this, they might say, “Oh well, now you’ve got to have this actor come in and perform in all these languages” but the that doesn’t necessarily different than what they’re doing already.

[0:13:18.2] MS: Rob, I got to say that you’re right. ADR or audio dialogue replacement is incredibly standard. So at almost any major feature film you’ll see, you’ll get the actors coming in and read, voicing their own lines in the same language that they did it originally just because yeah, as you say, like there might be odd background noise or they want a different miking. 

Like right now, you and I are speaking to each other with our microphones quite close to our mouths, something you just can’t do on set because obviously, it would be on shot. So the miking on set is always going to be sort of a different type of microphone recording than you can achieve in a sound booth. 

So for a whole lot of reasons, people do ADR but yes, if that actor is able to speak in another language, you could get them to re-voice themselves, if they can’t, yeah, you could get somebody else to do it, which is no different than the dubbing market where we already, you know, dub. 

But there is one huge advantage in the process, apart from the obvious that they look better and they’re in lip sync, which is if you think about it, if I was delivering some dialogue and you wanted me rot be re-voiced but not have my face change, in other words, just a normal old-fashioned dub, you’d kind of want the words that are coming out to be the same duration and kind of the same length as the original English words because that’s where my lips are moving. 

Now, it’s not going to match perfectly but clearly, you don’t want to have audio of dialog continue when my mouth is shut. So in fact, you actually have to compromise the translation of the script to try and coerce it into being words in that foreign language that kind of fit with the lip movements or we’re called the visims the lip movements of the actor. 

So the script translation is far from a creative process where you’re in this language, the new language, we’re trying to come up with the right sentiment, the right intent of the actual authors. Instead, we’re trying to come up with something that’s just timing to the lip movements of the actor when they first said it. Now, that’s a compromise that’s pretty hard to take. 

In our process, it doesn’t apply because if you were saying in four words what I’d said in three, it would just match. We don’t have to match the timing, we don’t have to match the lip movement or jaw movement, it all gets fixed. 

So we can do a more authentic translation of the script more in keeping with what the scriptwriters had intended and hopefully, also allowing the actor to be a little more flexible in where they take a breath, their timing and just the emphasis on a word because as I am sure you know Rob, you can say anything and punch a particular word in a sentence to give a kind of particular meaning, a slant and that kind of meaning beneath the words is what we call the subtext of the same and that’s gold for an actor, getting the subtext right.

They can be saying one thing and meaning another and a lot of that is conveyed by their face and then a lot of it also conveyed by the tone of their voice, what words they’re emphasizing, minor hesitations, those really subtle acting stuff and we don’t want to step on all of that. We don’t want to be heavy handed in stopping the scriptwriters or the actors from being creative. 

[0:16:25.5] RS: Is this a natural language processing challenge or is this more of a creative filmmaking challenge? Both I suppose?

[0:16:33.4] MS: There is no need for us to do natural language processing in a strict kind of machine learning sets because the rewriting of the script is done by a person. You get the scriptwriters to do a translation and there are specialists obviously working in that area. So we are just giving more flexibility to those people that are doing the English from Polish translation to be accurate and in our case, the director could speak both English and Polish. 

So he would read the English script and say, “It would be great if he could add an extra word here because we’re really not getting the intent of the original Polish” that’s just people but there is a huge amount of machine learning that goes into making sure that we read — I think in my example, my face when I am giving the sound booth read your face from the performance and there is one other huge criteria I’d like to flag, which is putting this as a test you and I on a lab somewhere. 

I could record heaps of training material of you, heaps of training material of me and do a standard kind of approach and fuss with it and work with it for as long as it took to pull it off, not a problem. It’s not easy but it is not problematic. In the case of what we’re doing with the PLATO process through adapt is we’re trying to give a production solution to the industry and what does it mean? 

It means that you can deliver a finished film where we had nothing, no considerations, no extra footage, no outtakes, no B roll, no second version, nothing but the finished film and we would be able to convert that film. So we wouldn’t have extra training material, we’d have very short shots. We’d have shots where you could be running, driving, in shadow, in light, turning your head, aging, getting beaten up, any of those things. 

The process is robust enough to do that without requiring specialist training data, so there is no consideration required from onset or when the film has been made to make the process come off and then similarly, when we’re recording me in the sound booth, again as I said earlier, we don’t have to get specialist apparatus in there with complicated synchronized whatever. We don’t have to have the actors sitting in some kind of rigid head pose with dots all over their face and cameras hanging off their head. 

We try and keep it as natural as possible. We aim to pull off the process in really sensible amount of time for a film production and obviously in a budget that allows us at scale. So as Daryl said, the whole idea is just to make this as much more accessible for foreign films to a different market and that wouldn’t be the case if it was intrusive, incredibly expensive or very time consuming. 

[0:19:14.4] RS: Right, exactly and this is not just a challenge for your use case but every use case of artificial intelligence where you have to ask yourself, how much can we reasonably expect to disrupt an existing process with the promise of innovation before someone decides that they don’t want to bother with that, right? 

[0:19:33.0] MS: Yeah, I mean it’s that thing isn’t it like you can get really excited by the tick and it works really, really well and it works kind of theoretically and look, I am very fond of saying this because it actually happened. My father, literally my father once said to me, “Mike, you can read the entire works of Sigmund Freud but son, sooner or later you’re going to have to take a girl on a date.” And my attitude is you know, it’s all very well for us to get this working in the lab but it has to work in the real world. 

The real world is not just a matter of saying, “Oh yeah, if you did this, this, this and this it will all happen.” They’re like, “Well, that’s great mate but that’s just not going to – that’s not possible” and we want to be, as I said, very respectful to the creatives involved. Would you want to do a film if you knew your face was going to be altered by technology that wasn’t respectful of the performance you gave? 

No, you’d be as an actor, “Hey, I’m not doing that project” or “I am not allowing you to modify my face” because that is your reputation. If we modified your face and Rob, your original performance was spectacular but our modified version was kind of believably realistic but you just didn’t look like you’re acting very well, we’ll establish your reputation as a bad actor in a foreign market and you’d never want that. 

So it is terribly important that we don’t just come at this from a, “Hey, theoretically, somebody thinks this looks real” we have to be, “Hey, I didn’t even notice that this was happening and boy, Rob gave a great performance in that film.” 

[0:21:00.2] RS: Yeah, that makes sense and by this point, we have all seen, speaking about a generated performances that the actor objects to, we have all seen the deepfake Tom Cruise videos. How would you differentiate what PLATO was doing from the mainstream understanding of what a deepfake video is? 

[0:21:19.2] MS: Well firstly I’d say, I don’t, just to be clear, I don’t think Chris who did those videos had any objections from Tom Cruise is the last I heard and I have spoken to Chris. It was well-received and no, Tom Cruise didn’t object to the process and I think most people would agree that Chris’s work on doing the fake Tom Cruise is entertaining, funny and good setter but it is a fundamentally different process in what we are talking about for two reasons. 

One of course is they replaced the actor’s face completely with a synthetic Tom Cruise face and so what you’re getting there is the Tom Cruise full mask if you like. We’re not doing that, we’re blending in so that it’s, in this case to use the analogy, it would be my face on your body if we were doing a face swap deepfake but we’re not. We’re keeping your face on your face. It’s just that you’re mimicking my lip movements. 

Then the other big thing is Chris did a terrific job on the thing but like they were taking like ten hours or something to do with shot. We’re going to need to plow through hundreds and hundreds of shots in a film and we just can’t afford that kind of bespoke curation that Chris has done such a great job on. So deepfakes tend to be satirical and they tend to be crafted and even if they are not crafted, there are full face swap. 

The other thing to remember about that is most of those videos, if you actually analyze them, this isn’t like strictly totally true but kind of pretty close, eyes straight to camera. So it’s like Barack Obama facing the camera, it’s like Arnold Schwarzenegger facing the camera, it’s a Tonight Show host like Jimmy Kimmel or whoever facing the camera in a very evenly lit studio environment. 

Like when a politician is delivering a speech, when a Tonight Show host has got a guest on, they are very evenly lit and consistently lit and they’re facing the camera. Our process isn’t like that at all because we have a guy speaking in a foreign language while boxing and while people are hitting him in the face and running around and we’ve got him at night missing spotlights from guards. We’ve got him in the daytime on the work gang. 

We’ve got kids at the dining room table and we’ve got children in the darkened corners of Auschwitz nearly starving to death and so that full range of lighting angles and action are really quite different from your demo clever standard, which are good. Don’t get me wrong, they’re great those deepfakes but they are just a very different technical hurdle to jump. 

[0:23:49.0] RS: When you explain all of the challenges there, what happens if someone is in different lighting? What happens if they are moving? What happens if they are being punched in the face, right? There’s a million more, there’s the continuity of someone perhaps of growing a beard or having changes to their face in the case of a boxer certainly, these are like it just gets more and more complicated and challenging the more you explain it. 

Are these additional ML approaches or how are you sort of factoring in just the increasing complexity of this challenge? 

[0:24:15.8] MS: Well, the good news here is that the tame includes Pinscreen in Los Angeles. Now, Pinscreen is a group of incredibly brilliant researchers and they’re the core of our technical backbone and they have been developing original machine learning algorithms, GANs, like a whole range of solutions to address these problems. So you’re right, it isn’t just one black box. Now, our magic bullet if you like is the genius of the research team at Pinscreen.

But their solutions are a concatenation of a series of specialist tools that is automated as possible but still require some, I guess we’d call it manual intervention in the sense that – let me give you an example and it isn’t from our film but when you are producing machine learning, the approach is often to say, you know, more training data better and the guys at Pinscreen in our team clearly didn’t have that advantage. 

They couldn’t get tons of training data, they had to work off really, really short scene, sometimes under a second long. So they came up with really clever ways of dealing with this but if you just think about machine learning generally, yeah, most people would say more machine learning the better but you need to curate your machine learning for the training space that you want to establish because the training material defines a training space within which you get the best training solution. 

There’s another project that was done that we weren’t involved with called ‘Welcome to Chechnya’ and in that project, the team that were doing that did a different process to us but it was a splendid face replacement like you were talking about before, like a deepfake face replacement process but there they discovered that rather than just use all their training material, which quite often for them had the actor turning to the side to talk to the director. 

I was just rolling the cameras and he looked to the side, talk to the director for five minutes, turn back, do a take, turn back to the director of, “Hey, was that good? How was it?” et cetera, that’s causing the training data to be incredibly biased to one side of his face because whenever he wasn’t delivering a line, he was looking to the side and they quickly established, “Well, yeah, it’s heaps of training data but it’s disproportionality at least showing one side of this actor’s face.” 

So that project very quickly worked out that it would be really good to be able to curate through new machine learning processes and so Ryan was the visual effects supervisor and the lead machine learning expert on that. Ryan did a spectacularly good job and he’s a dedicated documentarian who is keen to use this technology for, in this case, protecting the identities of LGBTQ community members who were in the film that would otherwise be persecuted if their identities were revealed. 

So a different problem but in his work, he quickly came to the realization that as in fact we did with a completely separate application, you need to be very specific about what training data are you using and when and so there’s a lot of smarts, a lot of good experience that is still added to the mix. So it’s automated as our process is or any process is, you still really value having a really good machine learning expert driving things. 

[0:27:23.8] RS: Of course, yeah. May I ask you to maybe prognosticate a little bit here, what are some other areas of the entertainment industry you expect AI to be disruptive? 

[0:27:35.0] MS: Well, that’s a great question and hey, I don’t have any crystal ball but there’s a bunch of areas generally on the table but already being kind of looked at. So one thing is very fine temporally accurate segmentation because we limit segmentation on a still, okay, you get a lifted element out like you can in sort of various plugins with Photoshop or whatever but in the film industry what you want is a temporal version of that. 

Where I’d be separating you from the background instead of using a green screen but it is not just getting a segmentation of you in a machine learning sense, it’s then being able to effectively spline that curve, such that an artist can then tweak it. So if I’ve got a 100 frames and it lifts you out, I can’t have every version of the map or the extraction be different and unrelated to the one that comes after it because the director says, “Oh wait, I’d like that edge pulled in.” 

You need to have that feathered in and feathered out effectively over 120 frames and you can’t do that if you’re having to tweak it on a per frame basis and so we call that Rotoscoping and so that’s the holy grail. If we could machine learning to provide a spline-based, editor pull, artist friendly, temporarily consistent segmentation, there is millions of dollars on the table and so people are soaking to do that. 

It is a bit of a dull processing and manually and so if you could automate it, it would be spectacular and then that leads into lots of other things like rig removal, the stunts. There is a lot of work with using what you are calling before like deepfakes or when you are rendering for face replacement, so that your stunt double Rob has your face on him when he leaps through the building and smashes through the glass and you yourself don’t have to do that. 

Yeah, so there’s a bunch of things like that. The one for us that is really interesting and we’re starting to look at this with a few companies like say Respeecher in the Ukraine is changing the voice at the same time, we change the face. So to go back to our original example, it would be me voicing you, you’re the actor but in addition to your lips moving exactly like mine did, you no longer get my voice coming out of your mouth. 

You get a cloned version of my voice that sounds like you delivering the words that I said, so we’d change audio and vision. 

[0:29:53.7] RS: Nothing but opportunities for this tech to be disruptive it sounds like. 

[0:29:58.1] MS: Oh absolutely. 

[0:29:59.4] RS: Listeners out there if they are keen on a new application, they have their marching orders for the business that they should form. Mike, we are creeping up on optimal podcast length here, so at this point I would just say thank you so much for being a part of the show and sharing your experience and expertise with me. This has been fascinating, I am really, really pleased you joined me today. 

[0:30:17.7] MS: Rob, it’s been a pleasure, thank you. 

[0:30:21.8] RS: How AI Happens is brought to you by Sama. Sama provides accurate data for ambitious AI, specializing in image, video and sensor data annotation and validation for machine learning algorithms and industries, such as transportation, retail, ecommerce, media, medtech, robotics and agriculture. For more information, head to sama.com.