The question of AI ethics and privacy is becoming more relevant by the millisecond. Joining us today to discuss the complexity, evolution, and the future of privacy in the AI space, is Venture Partner of MATR Ventures, Hessie Jones.
MATR Ventures Partner, Hessie Jones, is dedicated to solving issues around AI ethics as well as diversity & representation in the space. In our conversation with her, she breaks down how she came to beleive something was wrong with the way companies harvest & use data, and the steps she has taken towards solving the privacy problem. We discuss the danger of intentionally convoluted terms and conditions and the problem with synthetic data. Tune in to hear about the future of biometrics and data privacy and the emerging technologies using data to increase accountability.
Key Points From This Episode:
Tweetables:
“Venture capital is not immune to the diversity problems that we see today.” — Hessie Jones [0:05:04]
“We should separate who you are as an individual from who you are as a business customer.” — Hessie Jones [0:08:49]
“The problem I see with synthetic data is the rise of deep fakes.” — Hessie Jones [0:21:24]
“The future is really about data that’s not shared, or if it’s shared, it’s shared in a way that increases accountability.” — Hessie Jones [0:26:43]
Links Mentioned in Today’s Episode:
EPISODE 42
"HJ: Just because the platforms have been doing this for years, doesn't mean they should have. It's because legislation couldn't catch up to everything that was happening below the line. We need tech to actually stand up and say, 'We're not going to do that anymore.'"
[00:00:19] RS: Welcome to How AI Happens, a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers, as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson, and we're about to learn How AI Happens.
[00:00:47] RS: Here with me today on How AI Happens is the Venture Partner over at MATR Ventures, Hessie Jones. Hessie, welcome to the podcast. How are you today?
[00:00:54] HJ: I'm very good. Thank you, Rob, for having me here.
[00:00:56] RS: You have such an interesting background. Before we get too deep in the weeds here. Would you mind sharing a little bit about your background, how you kind of wound up in your current role at MATR?
[00:01:04] HJ: Yeah. My background, I started off as a marketer. I graduated in marketing at business school and I dove quickly into data marketing, database marketing. Early on, everything that I learned from a marketing standpoint, had to be from an analytics and measurement capability. Everything that we did was performance-based. That was really where I started. I worked at ad agencies, including Ogilvy, Rapp Collins Worldwide, Aegis Media. In those rules, I actually evolved after I guess, Y2K 1999, I delved more into the digital space. I realized that there's so much more that we can do here. I went into banking shortly after, partly because the hours were crazy when it comes to agency. But what I realized is that, once you got into digital, it didn't matter whether or not you worked at a bank, or an ad agency, you'd be working the same kind of hours because it was such a disruptive medium.
I worked in banking for a while, trying to understand how we could actually drive offline to online. That meant direct mail to actual direct digital, which was really now, I guess, people call them banner ads, or online advertising, or Google AdWords or whatever. That was the beginning of me getting into digital. I soon realized that banking, it took too long, that almost took too long to do many things. Even though they're trying to be disruptive, there was still a chain of command, and there was still a lot of red tape, and I guess, a lot of proving out to do before you could actually execute or put something into production.
I left there to go to Yahoo, and Yahoo was the place that really changed the game for me. It introduced me to a different type of workstyle, which everybody talks about, fail fast, being able to change things on the fly. Failing was great, because we realized we could rectify things just as fast as we created failure in the systems. But it also introduced me to this idea of community, and I never really understood it before I went to Yahoo. That's where I launched at Yahoo Answers here in Canada, and really understood the dynamics of how people who didn't know each other, from all across the world could actually communicate, get to know each other, engage with each other. For me, it introduced a new type of business model that today now is like social media. Back then, it was social media from an organic standpoint and it quickly evolved into a social media from very much a profitable business model standpoint. So I did that for a while.
I basically have been in startup for the last 10 years, from big data analytics, to things like profiling, to customer journey analytics. Then, in the last four to five years, I've really concentrated on AI ethics and privacy. A lot of it because I realized that there are a lot of problems that we created, even just as advertisers that compounded the problems that we see today. I wanted to be part of the solution, not the problem. Venture capital happened for me in the last couple of years as well when COVID hit because we realized the issues that were happening with respect to diversity in the workplace. Venture capital is not immune to the diversity problems that we see today. I wanted to make sure that the individuals and groups that have not had access to investment actually had that opportunity as well. Coming in from the startup space, I wanted to specifically support those groups, women, BIPOC, indigenous, as well as the neurodiverse populations and new immigrants to the country to be able to have equal access to programs, but also to investment.
[00:05:31] RS: It's a fantastic mission. I definitely want to get more into the way you do business at MATR. Firstly, though, could you share a little bit more about what inspired you to focus on AI ethics? You mentioned that you were seeing some problems, even in the way the ads were deployed? Was there like Rubicon moment there? Or what was it for you that you're like, "Okay. This is fraught in some areas; I need to be art of the team being thoughtful about this"?
[00:05:58] HJ: I told you I was in advertising for many years, but also being part of the ad platform. At Yahoo, that was our bread and butter, was advertising. My boss was, I think back in 2005, he was the guru of behavioral marketing. Just by virtue of being able to see where people were going on our website at Yahoo, we could surmise what their interests were. I think, what we also had the ability to do, which I think kind of crossed the line was to be able to machine read emails. Your yahoo email could potentially have used the words travel or trip between you and your girlfriend, we could pop an ad that showed, "Oh, hey! How about this vacation spot for you?" At the time that you've actually sent that email.
When I started to see things like that, I started to question whether or not that was actually crossing the line. I think it wasn't until probably years later that I moved away from advertising, but started doing a lot more AI stuff, that even the startups I was working with, doing customer journey analytics, doing profiling, where we could essentially go and look at people's profiles on social media, and understand who they were, what they needed, what they're like, what their moods were like. We started being able to look at things like sentiment and to be able to discern so many more things about them. I thought, "Wow! What a great idea if you can marry transaction with profile." Transaction was what the business had, they would know how often you came to the bank, your services, what you spent and all that stuff. What if we wanted them to know more about you as an individual, what you liked, what the bad things that you're saying on social media, things you didn't like, and marry those two things so that we can target you better.
As much as this was a business opportunity that I thought was amazing, it wasn't until years later that I realized we shouldn't be doing that. We should actually separate who you are as an individual to who you are as a business customer. Because from my perspective as a customer, businesses should not have access to that type of information if you don't want them to.
[00:08:34] RS: Yeah. That's where I wanted to go next is, was the – the problem for you, privacy was it that, this was happening without people's input, that they had kind of accepted terms and conditions without really considering what it meant? Because what you're describing, turning transaction into a profile, this like this dream of business, to be able to more accurately predict what customers needed to meet them with a product at the moment they wanted it. Right? But that just ignores the consumers role in it. Was it merely privacy? What was the thrift to the problem that made you spooked by it?
[00:09:09] HJ: Well, there are couple of things, because I also remember, I think it was, gosh, 2011, 2012 when Snowden came out. He talked about the government's use of information. Usually, let's say, somebody is doing something wrong. You need a warrant to be able to go deeper and determine whether or not that is in fact true. Well, the government wasn't doing that. They're basically to find, let's say, the 1% or 2% that were doing potentially bad things. They needed to look at the 98% who probably weren't doing anything. They had access to information from everybody and 98% of them could have been innocent.
To me, it wasn't necessarily a privacy thing, because I didn't really understand the data privacy side until much later on. What I thought was, is that this just doesn't seem right. I had discussions, especially with friends back when I was actually on Facebook, which I deleted five or six years ago. When the stuff about Snowden came out, people started blanketing marketers and advertising agencies the same way as the government. They said, "We never asked to be targeted. We never asked to have this done to us. We never asked that you do this to our data." When executives said this, she was a brand executive. She said, "You want us to give you the most relevant information so you could purchase and make things easier for you, so you don't have to actually Google for the best deals or whatever. You could have the best information come to you, so you don't have to do that. And yet, you're starting to cry foul, when it comes to all this stuff."
A friend of mine, [Julie Pippard 00:11:00] had said, "No, I don't have anything to hide and that's not the point. You don't have the right to use my information, because that's my choice." Then when it can't came down to it, it came down to the fact that my private information is my right. When we talk about personal property, people saw it very much as personal property. This is stuff that lives in my home, or on my person, and I don't allow anybody in the door unless they have permission, right? It's the same thing as a warrant. You can't come into my home unless you have a warrant. People started seeing personal information the same way. I'm not saying the millennials, or people that were younger, that actually grew up with the internet, thought they were lost causes, and they said, "You know what. I've already given myself out there." I use Instagram. I use Snapchat. I use all these social media places. I can't do anything about it. I can't take it back, but that's not necessarily true. I think, people have to realize that they do have a choice. Just because the platforms have been doing this for years, it doesn't mean they should have. It's because legislation couldn't catch up to everything that was happening below the line. I think that is a big part of it. We do have lagging legislation. We need tech to actually stand up and say, "We're not going to do that anymore."
[00:12:36] RS: What was the beginning of your work then, once you sort of realized that this was a problem?
[00:12:40] HJ: Yeah. I used to work for a company called Cerebri. Cerebri, I think is a company that still exists here in Toronto, but they were doing customer journey analytics. The idea was to be able to ingest a lot of information, so that we could discern what a customer's path to purchase was. Everybody knows about the buying funnel, correct? But we did it for the buyer journey for both cars, as well as banking. The one thing that we also started to see was that the information that we're getting, we started to question whether or not we should get that information. Because if you can imagine the amount of data that a bank has or that a car company has on customer information, all their transactions, all the different types of datasets, it's massive. But one thing that I realize is that there's a lot of information in there that didn't necessarily have positive consent attached to it.
Do customers know that, for example, this bank is using this data to specifically analyze? Like you alluded to earlier, Rob, when people sign or tick that box on privacy just to get to the experience, they have no idea what they're signing their life away for. What these terms and conditions have done is basically created this blanket effect that allowed customers to just sign their life away and not read stuff. That's just so complicated legally. When we were doing some analysis at BEACON Trust Network, because we're developing an application for data privacy impact assessments. We realized that the reading level on many of these terms and conditions was more suitable for, let's say, a 30 or 40-year-old. It wasn't simple enough. If anybody has read any kind of legal terms and conditions, they tried to bury as much as they can deliver in complicated language. Google, Microsoft, all the big guys that have terms and conditions that are like 1015 pages long, that's the other way to do it. Let's just pummel you with so much information that you can't read it. That's another way to reduce their accountability to citizens.
The one thing I realized is that, again, we're getting information that we shouldn't have. But also, during that time, I would have discussions with two people. I was doing an AI for good conference, and I met two people. One from Children's Hospital here in Canada. She was an AI researcher that was actually doing a lot of research to determine the causes for kids' diseases like cancer, et cetera. She was using AI to figure that out. On the other side, I met with an engineer who worked at Uber. We had some interesting discussions about accountability. I think this is what got me started down this path, because there was no understanding or even accountability at their level to say, "I just write the code. I don't have to solve the trolley problem. If the car decides that the probability of killing two people versus one person, they take the path of least resistance. Correct?" I said, "That's very much a human problem, not a computer problem."
The idea of leaving ethics to a machine didn't make sense, or even if that could happen, because people are wired very much differently than machines. The problem that we have is that we inject our own biases into these models. A machine doesn't or can't necessarily make the right decision based on circumstances. It's really grandfathered from its maker. That's when I started thinking, there has to be more to this. Like a researcher who says, I want to be able to help cure cancer in kids. But hey, if I only have datasets from children in let's say, Texas, but I want something more representative, so it will able to cure this type of cancer from all over the world. Well, geographies have different impacts on a specific disease, right? The food, the conditions are very different from country to country.
You have to take all those things in consideration, which means that your data set has to be more representative of the different geographies, the different cultures that you're trying to solve this problem for. We started running into those issues, the data set problem, the model problem, the black box. The more I started digging into it, I just realized, AI is just not ready. It's just not ready.
[00:17:54] RS: Yeah, particularly with the data being used and collected. This is a common question that comes up. When do you know that your model is ready to put into production? It's had enough training data. In a world where it's going to be deployed for different populations, different kinds of people, there's probably never enough, right? It's always going to be replete with some bias. I spoke with Roman Yampolsky, a couple episodes ago, and he made the point that you can't even get human beings to agree on what is ethical, so how are you going to codify it into a machine? I thought that summed it up nicely.
This kind of bridges to another point I wanted to speak about, this intersection of personal privacy, data usage, and also just the implicit biases at play here. Where does this lead us to the possibility for synthetic data? Is synthetic data a panacea for a lack of diverse data or a bias free data?
[00:18:48] HJ: I say, if you had asked me this question, probably 10 months ago, I would have said, "Yes." I said, "For sure." If there's ability to actually use information that you can cut into thousands and thousands of pieces, make it representative in a way that mitigates harms to individuals, I'd say for sure. But the problem I see with synthetic data, the more I think about it is, is the rise of deep fakes. I've seen companies actually use methods like GANs to be able to fool computers into understanding what's fake and what's real for the protection of privacy.
See, I believe you've uploaded let's say, 10 pictures of yourself to the internet. Google has been really accurate at identifying who is Rob on Instagram and who is Rob on Facebook, et cetera, and every other channel that where you've uploaded your picture. There are companies out there. There's one company in particular that was able to say, "Well, how about we upload your picture to the internet, but we apply GANs, which is an adversarial algorithm, to be able to create a little bit of noise enough on the image, that when a computer reads it, it will say, "That's not Rob"? But to a human eye, that's Rob for sure. From a privacy perspective, you can have 15 different pictures of yourself on the Internet, and a computer would say, "No, that's not him." That's good from that perspective.
But then once you start to amplify the use of this information, what if we took it to the next level and we'd say, "Well, that's not Rob." But let's be able to use it in a way that the more that you use the technology, we can now fake your voice, or image to a point that we don't know now what's real or not real. I think that's the problem that I see with synthetic data, especially today, when there is the rise of all this controversy on social media. People could say this is actually happening, but you the amount of fact checking that went on in the previous presidency, because he lied so many times. Now, it could even be amplified a lot more, because all you need to do is switch out somebody's voice with another voice, using synthetic data to be able to make that person seem to say the things that you want him to say without it being real. To an individual who can't discern what's real, it's going to be very difficult. That's the issue that I have when you try to replace one technology for another. It creates a whole slew of other problems.
[00:21:51] RS: Yeah. Even thinking about how poor fake information was, recently from places like Facebook and the things people will believe, because they'll read a headline and the auto generated three lines of text below the photo and be like, "Oh, this is the truth." I was like, if you read the thing, and then did 20 seconds of Googling, you'd probably be like, "I don't know about this." But people don't even go that far. What happens when fakes are even better? This problem was just getting worse surely. It doesn't strike me that facial recognition technology is even a solution, either, because how is that going to work in a sufficiently advanced deep fake world, right? It feels like there's kind of an arms raised against biometrics, or proving you are who you say you are. What can we do about this? Is Pandora already peeking her nose out of the box here? Is this going to happen no matter what we do? What's the solution?
[00:22:41] HJ: I don't think facial recognition is going to be a solution for even verifying that you are who you are. There's just too much. There's just too much out there. I don't know, if you heard the most recent thing, Clearview AI, they're known for perpetrating really awful things. They scrape data of millions of people all around the world, and then they sell it to law enforcement. While they've gone one step further, and they scrape the data of everyone, but they said to Ukraine specifically is for all the, let's say, Russian soldiers that were killed during this war, you will make sure that we can identify the fallen through facial recognition so that you could send that back to the families to make sure that they know. That's really crass, because now it sets up a precedent.
How do you know whether or not it's real? The one thing about facial recognition as well, is that in order to have some level of accuracy, you need biometrics. Biometrics on a face is usually – I think they say the ear to some extent, but it's really the iris, right? What are you going to do to a fallen soldier, open their eyes, take that picture and then match it against your database and say, "Yeah, that's him."? I mean, there's so many awful things attached to that. That to me, the future is really about data that's not shared. Or if it's shared, it's shared in a way that that increases accountability. There are things that are coming up that I think are amazing technologies. There's still very much edge case, but through verifiable credentials and the self-sovereign identity communities, they're starting to gain a lot of traction. Because the way that they talk about sharing data is really about – the relationship between like a Visa, MasterCard, and a bank and then the individual, it's the same thing with a verifiable credential. Use a verifiable credential that come, let's say, between your DMV, which is the issuer and it's about me, the holder. Then the verifier, which is, let's say, the guy at the border who say you are who you are. All they get on their end is a checkmark that says, Yeah, that's Hessie." They don't actually physically receive any data. They literally get a checkmark from the verifier.
I think the future is really about making sure that the data stays where it sits with the creators of the data, the individual who can potentially hold it in their wallet, but be able to carry it around with them. So that is the only place of truth for the real information. That's where we have to get to, because we don't want to rely on potential, let's say, fake data brokers that will sell information that may be untrue, about individuals, or be able to use very, very, very personal information. For example, Roe v. Wade right now is a big deal. The amount of information that's being shared to law enforcement, about databases, about women who have the potential to have an abortion in the coming months, or my daughter told me about this application called Flow, which allows individuals to manage their menstrual cycle, and the ability for the data from that to determine a woman's reproductive cycle, or health, reproductive health, and to share that with organizations is now heightened. How much more vulnerable will women be because of the applications that they use? It opens up avenues that people don't realize that make them much more vulnerable to the system. It's not fair.
[00:26:57] RS: You can take that too, the thought police extreme, certainly. I don't think that's too much of an inductive leap. But also, what's going to happen in the meantime, even before that is, you search a term like abortion, and then you're going to be inundated with whoever has paid against that. Then. there's a war going on for your mind in that way. Just because you are now going to be saturated with media, and with literature, and with ads, with imagery based on that, based on people who have incentives that are not your own. That to me is, it's as big of a problem? That sort of mass influence going on by the highest bidder.
[00:27:35] HJ: I think that will go away, eventually, not right away. But there have been fights, at least in Europe, against what the IAB was doing, and with advertisers being able to, I guess, micro target based on who you are as an individual, what your political affiliation is, whether or not you have specific diseases. They can get to that point. It was ruled in Europe that that they can't do that anymore. Because there is, right now, obviously, no positive consent that they've gotten from the consumer. Also, it's harmful. It's harmful, because much of the campaigns that are being run against that are very much political.
[00:28:24] RS: Hessie, I could keep talking about this with you all day. I do want to speak about MATR Ventures a little bit. I feel like we haven't got to the nitty gritty here. But I'd love to hear a little bit about the firm, and in particularly, AI technologies and under what circumstances you would choose to invest in an AI technology.
[00:28:38] HJ: MATR Ventures, we're an emerging fund, but we operate out of Toronto, but we invest both in Canada and in the US predominantly. We're a late seed to series A fund and we invest in underestimated founders. Like I said earlier, we want to invest in people that have game changing technologies, but they've previously never had access. It's women, it's black, indigenous, people of color, it's the neurodiverse community as well. The way we see technology, we're essentially technology agnostic. But for the most part, what I've tried to inject is, on top of being able to invest in these amazing founders, and game changing technologies, and the ability to actually do something meaningful for the world is also making sure that we take them to account, and that we invest in technologies that have the intention to do good. But as we know, the road to hell is paved with many, many good intentions.
How do we make them accountable in that journey? We have to employ technologies, as well as tools that will scrutinize not only their processes, but their technology to determine whether or not their models are flawed, whether or not they're collecting data that they shouldn't be collecting there. It's going to be supplanted with a lot of the laws that are already happening today, that will start to make startups and technologies much more accountable for the models that they create, as well as the data that they collect and manage. We want to be able to do that in tandem with what's already occurring, but also support new technologies in cybersecurity, in privacy tech that are that are trying to do good as well.
[00:30:46] RS: What are some examples of organizations or resources that you can deploy to measure your models and make sure that it's being built in a thoughtful way?
[00:30:54] HJ: Well, I would say, the Responsible AI is a nonprofit organization. They're already doing this. They've actually put together, I would say, like an open-source community across various industries to develop standards. The standards go through to a certification course. That that means that, eventually, it'll be like ISO 1000, where you actually put your company through this certification course. Then at the end of it, if you get certified, then that provides a checkmark into your accountability. There are DPIAs, like BEACON Trust Network. These are data privacy impact assessments, that will actually go through your technology stack, understand what applications you use, how that information is being used, and how much of it is being disclosed and what risks are there to you as an organization.
Those are technologies right now that are startups, that are starting to take hold. The more of these that actually get into the ecosystem as standards and standard technologies, the better we are. From an AI perspective and privacy, there are technologies and methods out there that deidentify, let's say personal information before they actually go into the model, so the model doesn't have a chance of memorizing PII data or leaking it. Lots of those are already happening. Many working groups are available right now to try to create machine learning models that allow us to obfuscate, anonymize or pseudonymize the PII information in existing databases, so that when they take that data, and they use it for something else, that personal information is already extracted somehow.
[00:32:53] RS: Those are fantastic resources. I will make sure to put some links in the show notes so our friends out there in podcast land can investigate on their own. Hessie, this has been a magnificent conversation. Thank you so much for being here and sharing your experience with me. I've loved learning from you today.
[00:33:07] HJ: Thanks, Rob. I appreciate it.
[00:33:16] RS: How AI Happens is brought to you by Sama. Sama provides accurate data for ambitious AI, specializing in image, video, and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, ecommerce, media, medtech, robotics and agriculture. For more information, head to sama.com.