How AI Happens

AI Safety Engineering - Dr. Roman Yampolskiy

Episode Summary

Episode Notes

Today’s guest has committed many years of his life to trying to understand Artificial Superintelligence and the security concerns associated with it. Dr. Roman Yampolskiy is a computer scientist (with a Ph.D. in behavioral biometrics), and an Associate Professor at the University of Louisville. He is also the author of the book Artificial Superintelligence: A Futuristic Approach. Today he joins us to discuss AI safety engineering. You’ll hear about some of the safety problems he has discovered in his 10 years of research, his thoughts on accountability and ownership when AI fails, and whether he believes it’s possible to enact any real safety measures in light of the decentralization and commoditization of processing power. You’ll discover some of the near-term risks of not prioritizing safety engineering in AI, how to make sure you’re developing it in a safe capacity, and what organizations are deploying it in a way that Dr. Yampolskiy believes to be above board.

Key Points From This Episode:

An introduction to Dr. Roman Yampolskiy, his education, and how he ended up in his current role.
Insight into Dr. Yampolskiy’s Ph.D. dissertation in behavioral biometrics and what he learned from it.
A definition of AI safety engineering.
The two subcomponents of AI safety: systems we already have and future AI.
Thoughts on whether or not there is a greater need for guardrails in AI than other forms of technology.
Some of the safety problems that Dr. Yampolskiy has discovered in his 10 years of research.
Dr. Yampolskiy’s thoughts on the need for some type of AI security governing body or oversight board.
Whether it’s possible to enact any sort of safety in light of the decentralization and commoditization of processing power.
Solvable problem areas.
Trying to negotiate the tradeoff between enabling AI to have creative freedom and being able to control it.
Thoughts on whether or not there will be a time where we will have to decide whether or not to go past the point of no return in terms of AI superintelligence.
Some of the near-term risks of not prioritizing safety engineering in AI.
What led Dr. Yampolskiy to focus on this area of AI expertise.
How to make sure you’re developing AI safely.
Thoughts on accountability and ownership when AI fails, and the legal implications of this.
Other problems Dr. Yampolskiy has uncovered.
Thoughts on the need for a greater understanding of the implications of AI work and whether or not this is a conceivable solution.
Use cases or organizations that are deploying AI in a way that Dr. Yampolskiy believes to be above board.
Questions that Dr. Yampolskiy would be asking if he was on an AI development safety team.
How you can measure progress in safety work.

Tweetables:

“Long term, we want to make sure that we don’t create something which is more capable than us and completely out of control.” — @romanyam [0:04:27]

“This is the tradeoff we’re facing: Either [AI] is going to be very capable, independent, and creative, or we can control it.” — @romanyam [0:12:11]

“Maybe there are problems that we really need Superintelligence [to solve]. In that case, we have to give it more creative freedom but with that comes the danger of it making decisions that we will not like.” — @romanyam [0:12:31]

“The more capable the system is, the more it is deployed, the more damage it can cause.” — @romanyam [0:14:55]

“It seems like it’s the most important problem, it’s the meta-solution to all the other problems. If you can make friendly well-controlled superintelligence, everything else is trivial. It will solve it for you.” — @romanyam [0:15:26]

Links Mentioned in Today’s Episode:

Dr. Roman Yampolskiy

Artificial Superintelligence: A Futuristic Approach

Dr. Roman Yampolskiy on Twitter

Episode Transcription

[00:00:12] RS: Welcome to How AI happens, a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson, and we're about to learn how AI happens.

[00:00:40] RS: Joining me today on How AI happens is an Associate Professor at the University of Louisville, Dr. Roman Yampolskiy.

Doctor, welcome to the podcast. How are you today?

[00:00:49] RY: Great. Thanks for inviting me.

[00:00:50] RS: I’m really pleased you're here. You are broadcasting in, it looks like, from your office at the University of Louisville. Is that correct?

[00:00:57] RY: That's right.

[00:00:57] RS: So if we get any errant knocks on the door, people trying to come in for office hours, tell them to subscribe to the podcast if they want to hear from you.

[00:01:05] RY: Will do.

[00:01:06] RS: I have so much I want to go into with you, because you have kind of touched a lot of different areas in the space. And so before we get into the weeds here, would you mind to set some context for the folks at home and share a little bit about your background and your studies and kind of how you wound up in your current role at Louisville?

[00:01:21] RY: I'm a computer scientist. I did my PhD and behavioral biometrics, and kind of naturally started looking at behaviors of not just human agents, but also bots. The bots keep getting smarter. So we need to keep up. There's a natural progression to analyzing artificial intelligence. How it improves? How we can keep up? Predict its behavior, explain it, and maybe even control it.

[00:01:45] RS: You have a PhD in Computer Science and Engineering. I believe was your dissertation in behavioral biometrics? Do I have that correct?

[00:01:51] RY: That's right, I was looking at game strategy as a novel behavioral biometrics, specifically online poker.

[00:01:57] RS: Got it? What did you learn?

[00:01:59] RY: People have unique behaviors, and so do Ais. If they’re not good enough to randomize, you can abuse that repetitive behavior and make some money.

[00:02:09] RS: That's maybe a topic for a separate podcast episode, or maybe one where we're not being recorded. Can we start with your recent book that you published, Artificial Super Intelligence: A Futuristic Approach, where you spend a good amount of real estate in the book speaking about the importance of safety engineering in artificial intelligence? Would you mind defining AI safety engineering and perhaps even distinguishing it from terms like explainable AI?

[00:02:35] RY: Sure. So for about 50 years, AI was all about making it work. We didn't care about much else. We just wanted something we can sell products, services. We started getting some success. And now we realize it's not enough to just make it work. It has to work in real world. It has to not violate privacy expectations. It has to be bias-free. It has to be safe if it's controlling cyber infrastructure.

So more and more capabilities of those systems increase, we realize possibilities for damage are also significant. And there is very little research, at least until recently now to address it. So when people say, “We want to create very capable intelligent systems,” they can imply that systems which will not harm users will not embarrass the company producing it. But it's not something you get as a side effect if you're not specifically work on getting that part of a system to do it, just try it.

And a lot of times, you don't know how it might fail. You don't expect certain side effects. Companies get surprised. And so AI safety can be kind of divided into two sub components. Systems, we have already, narrow AI, spell checkers, text prep, anything like that, which fail to deliver successful product or service. Not a huge deal. Mostly inconvenience. People worry about, as I said, bias. They worry about loss of labor, technological unemployment.

And then the second part is future AI. Systems we don't have yet, but which we anticipate will create eventually, which as smart as people, or maybe smarter. And they are general. They can make mistakes in any domain. They can impact all sorts of critical systems, stock market, military response. Really, anything is controllable by software. So that's where the big concerns are about existential risks, suffering risks. So we can gradually learn how to deal with what we already have, privacy concerns. How to make sure that data doesn't leak information about whoever is providing the data?

And bias is another very big one. And most companies took some sort of pledge about not deploying AI which is biased against certain races or genders. But all of it is kind of short term, near term problems. Long term, we want to make sure we don't create something, which is more capable than us and completely out of control.

[00:05:10] RS: That last piece is interesting to me, because some of the concerns you laid out there, privacy controls, or possibility for bias to creep in, this has been a risk of technologies previously, right? Not just with artificial intelligence. Do you think that the stakes are higher? That there's a greater need to put in guardrails, for lack of a better term, in AI specifically as compared to other tech?

[00:05:32] RY: Right. So most tech is tools. You will have tools, which could be very dangerous, especially in the hands of the wrong person. But AI, at least at human level, is an agent. It's an independent entity, which makes its own decisions, can decide and explore new pathways to achieve its goals and come up with new problems, which were not originally part of the design. So it's a very different situation.

Tools, we can know how to deal with. It's more of a deterministic set of problems we can consider. Okay, those are the issues we need to worry about. With agent like system, it's open-ended. It can do really anything in any domain. And so it's much harder to test it to the bargain to make sure it will function as expected to novel situations.

[00:06:17] RS: Right. The risks are really limited by your own imagination in the case of an agent, right?

[00:06:22] RY: Not yours. The systems. That's the problem. If it was mine, I would be able to predict some of it. The system is smarter than me. And that has a much bigger set of possible domains of influence.

[00:06:33] RS: Yeah, that's well-pointed out? And what kind of safety would you engineer in this case? And if the technology has a possibility to think beyond you, what are the guardrails you can put in place?

[00:06:42] RY: Welcome to my research. It's been 10 years, and I have very few solutions and mostly new problems.

[00:06:49] RS: What are some of the new problems? Let's start there, I guess, and work backwards.

[00:06:52] RY: So every time we kind of decide, okay, in order to control those systems, we'll need certain tools. We need to be able to explain how they work. We need to predict the decisions they make. We need to be able to verify software. And there is about a dozen hours I can talk about. For each one, we discover there are well-known proven results, which say you cannot do that. You cannot have, for example, a much smarter system, which we can predict in advance in terms of all the decisions that will make. We can kind of understand, okay, the chess program will win. But we have no idea what moves it will make. So details are obscured from us.

If a system tried explaining what it's doing, or how it will ever have to simplify the actual answer so we can get it, or it will be so complex, we will not comprehend the answer. There are similar limitations from many domains, which we feel would be part of that solution; economics, political science, psychology, they all have their own kind of well-known, okay, voting. We know there are limitations to every voting system. Somebody will not be happy with the result. Mathematics, we know there are self-referential proofs which cause problems with verification, and so on and so on. We have a survey paper looking at dozens and dozens of search results.

[00:08:12] RS: So it would then stand to reason that you would need an expert within each individual subfield to design rules that would be limiting or increase safety based on the application attack, right? There's no way you could kind of put an umbrella over AI as a technology writ large and expected to perform and institute those safeguards. Is that correct?

[00:08:32] RY: It's definitely useful to have very diverse set of cognitive tools to work in this field. Pretty much everyone can contribute something. There are some ideas for maybe trying to restrict capabilities of a system to the level of human experts. So we are more competitive. We can better understand what it's doing. But even that type of restriction is not obvious how to actually do that. The system is very likely to find a workaround any such obvious limitations.

[00:09:00] RS: Definitely. And on the topic of limitations, just in terms of implementing accountability, you have this challenge of incentives, right? Like particularly in companies want to make money and they can deploy tech to do so. Would there need to be some kind of governing body like outside of even state governments? I guess those sorts of non-governmental institutions and nonprofits do exist. But in addition to practical means, wouldn't you need some kind of oversight board?

[00:09:30] RY: So it really depends on how difficult the problem is. If it's a huge problem, and it requires computational resources and billions of dollars, something like Manhattan Project, then government supervision makes sense. You can supervise Amazon. You can supervise large companies like that, Google. If it turns out that the problem is much easier, somebody comes up with a very clever algorithm, you can do it on a laptop in your garage, then governance is kind of useless. It's a security theater. When we expand illegal, we made computer viruses illegal, it makes no difference, you still do it. So it really depends. And we don't know how hard the problem is right now. The best results we get is when we scale our systems. So scalability definitely brings results. So maybe there is a chance to employ some governance in terms of restricting available compute to large projects.

[00:10:20] RS: Yeah. That is certainly a challenge. The increasing decentralization plus the commoditization of processing power, how would one enact any safety in a world where anyone can do this in their garage?

[00:10:32] RY: It's possible that we can't. So everyone kind of assumes that the problem is solvable and it's a question of give me a little more money, and I'll solve it. Give me another grant. But it's quite possible that the problem is unsolvable. That's actually part of my research, control problem could be solvable, maybe. It could be solvable in some specific variation of it. So you have direct control. You have ideal advisor, where you have the smartest system deciding for you. You have kind of hybrid variations on it. So maybe some of them are not solvable at all. Maybe it's undecidable. Nobody has published anything explicitly proving that, “Okay, this can be done. We can control smarter systems.” So even the opposite. So what I published is a lot of kind of evidence from other fields and argumentation. But it's not a solid mathematical proof that for every system, any design, it would be true as well. So it's still open area of research.

[00:11:28] RS: Right. Where are some areas that you see problems that are solvable?

[00:11:34] RY: So anything kind of narrow domain where we can define what the space of possible solutions looks like, and we don't have to worry about creative answers. We can kind of make sure the system follows what we expect. Anytime we give it enough independence to be creative to think out of a box, that's where we don't know how to specify good versus bad. We don't know how to explain what good answers look like. What ethical moral answers look like. And it's not obvious that we will even agree with other humans about what good answers look like.

[00:12:08] RS: Right. And thinking outside the box, isn't that kind of an ultimate goal of developing artificial intelligence? In a simple definition, if artificial intelligence was to replicate human cognition or human behaviors, then surely, creativity, developing recursively, adapting, evolving, growing, changing, this is all part and parcel with it. So with the nature of AI in that case then prevent it from engendering soluble problems?

[00:12:32] RY: Right. So this is kind of a tradeoff we're facing. Either it’s going to be very capable, independent and creative, or we can control it. For some problems, even important problems like protein folding, it seems that we can have systems which are general, now the smartest humans, and still we can solve that type of important problem for biology, for medicine. But maybe there are problems we really need super intelligence, general super intelligence. And in that case, we have to kind of give it more creative freedom. But with that comes danger of making decisions we will not like.

[00:13:07] RS: In that latter case, do you think that there's needs to be a decision made whether we should, right? I keep coming back to this question in this show, which is, when it comes to the ethics of artificial intelligence, when it comes to explainable AI, this like abbreviated quote from Jeff Goldblum in Jurassic Park, “You asked if you could. You didn't ask if you should?” Are we going to reach a moment with artificial super intelligence where we kind of have to decide where this tech is going? And is that reasonable for us to expect there to be that kind of referendum?

[00:13:37] RY: It seems like we're unlikely to be able to decide not to do it. The pressures which are economic, prestige, malevolent actors, the same people who write computer viruses, Doomsday calls, crazies, somebody's going to press that button. So it's very unlikely that we can just have democratic vote and go, “Nope, we're not doing it. It doesn't work.”

[00:13:58] RS: What do you believe are some of the risks? I mean, we can go as dystopian as you care to hear. But in the near term perhaps, what do you think are some of the risks of not prioritizing safety engineering?

[00:14:10] RY: So I have a paper in which I survey AI accidents, historic accidents. And this is all for tool AI, for narrow AI systems. And usually the formula is, if you make a product or service to do X, it will fail to X, whatever X means. Driving a car spell, checking your tax messages, it doesn't matter. It fails that task. So if we're projected to general systems, it fails at everything. So if you have a system controlling all your cyber infrastructure, it can destroy your economic power through crash in stock market. It can start to win or lose a war, nuclear war, possibly. And those are things which we can predict those known unknowns. We don't know what it's going to do, but we can understand those things. They're also unknown unknowns, where a super intelligence is smarter. It comes up with something out of the box we didn’t consider, which is also very bad, potentially.

[00:15:03] RS: So the risks are really just, as you point out, whatever the task is that could be failed, right? What are the stakes of the failure of any given task? Because we will be deploying this technology conceivably for any given task, right?

[00:15:15] RY: Absolutely. The more capable the system is, the more it is deployed, the more the damage it can cause. Absolutely.

[00:15:23] RS: I want to ask about you personally, doctor, because you have this very interesting experience. A lot of background in several different applications, behavioral biometrics, as we mentioned, but also game theory, and games, and neural networks, just to name a few. It strikes me, you could have led to your expertise, perhaps, to any application of artificial intelligence. So what made you decide to focus on this area in particular?

[00:15:45] RY: It seems like it's the most important problem. It's kind of a meta solution to all the other problems. If you can make friendly, well-controlled super intelligence, everything else is trivial, it will solve it for you. Other existential risks, climate change, bio hazards, and also any products or services. So as long as you have a tool done, right, everything else is easy.

[00:16:07] RS: It sounds simple when you plan that way. Everything is sort of downstream of making sure this technology works for us, I suppose, is the point there. So if I'm listening to this, and I am you, a practitioner, I'm developing artificial intelligence, what are some questions I can ask myself as I'm developing tech to make sure I'm doing so in a safe capacity?

[00:16:26] RY: So if you are about to release a product or service, definitely considering what ways it could be misused? In what ways it can fail? We see some examples, again, in the paper I mentioned where companies released a product. It was obvious to anyone who takes five minutes to think about it what the consequences would be, but they didn't. And the company gets embarrassed. They have to pull the product, the whole backlash. The industry is unhappy. So it's a good idea to kind of think about those things if you're just working in some narrow product or service. If you're doing general intelligence, that's a whole different story.

[00:16:59] RS: And it's related to this notion of accountability, because if you are an individual on a team of individuals all working on a product and you have a boss, and a boss's boss, and shareholders and whatnot, it's easy to say it's not my problem, or it's not my fault, right? Like who is truly accountable for the fallout from this tech failing? You wrote a recent paper, The Unknown Ability of AI: Why Legal Ownership of Artificial Intelligence is Hard?

And what stood out to me, accountability is the point here, right? Like when this fails, who do we blame? It cannot be that we just find a company. But you brought up this point towards the end. I'll just quote it back to you, if that's okay, where you wrote that, “If AI is capable of recursive self-improvement, its source code, or at least model parameters and neural weights would be subject to continuous change, making it impossible to claim that current AI is the same as original AI produced some time ago. In that example, who is responsible?

[00:17:54] RY: Right. So that's the big question. Are we talking about ownership in terms of who owns the startup or company? Engineers designing it? Users who kind of bought it as a service, AI as a service, just add your own goals and your own data. It's not obvious. If a system is human level or above, it is the one making decisions. No one could predict that it will make that exact decisions by definition. So we need to kind of reevaluate our concepts of responsibility. You cannot punish AI. You cannot put AI in prison. It's not meaningful in any way. So that's not an option. Then who is responsible?

Saying that the developers are responsible is also problematic, because it's like saying parents are responsible for behavior or 30-year-olds. I mean, they're independent agents. They make their own decisions. You couldn't predict what they're going to do. You can kind of give them a baseline of good ethics, but that's all you can hope for. And from that point on, they make independent decisions.

The point of that paper is also that even if we wanted to find the owner and blame them, it's not meaningful. Legal ownership of something you cannot describe. You cannot kind of capture an instance in a static way, because it's so dynamic and changing. It's not meaningful. So claims of ownership in any case, to get profits from it, or to deny responsibility, would not be meaningful, at least in terms of how we define legal ownership of products, and companies, and services.

[00:19:24] RS: Where do we go with this? It feels like there's equal parts, like this is really important, but also unstoppable, right? Like when you say there all these crazies out here, someone's going to push that button. We can’t just decide not to do something. Pandora only comes out of the box. She doesn't go back in. So again, you did mention earlier that you had kind of stopped short of proposing solutions. So I guess can we just keep rattling off some more challenges? What are some more problems you've uncovered?

[00:19:49] RY: Well, it's kind of fractal in nature. For every solution, we propose even small patch for some sub-problem. We discover there are 10 additional problems with that solution. That new code kind of opens up additional attack surface. A lot of work, a lot of resources goes to developing more capable AI. Some people are now doing work on safety. They pick a specific sub-domain, and they work on it. But I think they would agree even if they succeeded 100% that patch would not solve the big problem.

I think we need a few more people looking at kind of what is possible in this space, at least theoretically? Are we likely to solve those problems? Are there solutions before we kind of try to either develop it or develop patches for the thing we're developing? Just a standard approach to computer science. Is the problem solvable? Before you put all your resources, maybe it's one of those undecidable, uncomputable problems?

[00:20:44] RS: Certainly. It's hard to understand the implications before something exists, though, right? Thrift to the problem here. So does this mean just like better simulation? Better – Like before something is let loose on a population or on a consumer base, really, that there's a greater need for like let's try and really understand the implication of our work beyond just sort of indulging with your own imagination what might happen here? Is that a conceivable solution?

[00:21:09] RY: Well, there is some work in trying to understand what it is we want at least. So to kind of envision a good utopia. Not like dystopia of science fiction, where people never really wrote good kind of heaven-like experiences. It's always boring and pointless. And you question why are you even doing that? So if we better understood what we're trying to build, maybe we will be more successful at getting there. I think right now, most people would struggle to kind of describe society. They see us permanently good, and they would be willing to accept without possibility of undoing it.

[00:21:44] RS: Yeah. I'm curious, Dr. Yampolskiy, when you think of deploying artificial intelligence and really being thoughtful at the implications of the tech, what are the use cases, or even maybe organizations who you think are doing it aboveboard in a meaningful, thoughtful way?

[00:22:00] RY: So open AI and deep mind both are doing excellent work developing more capable systems. And both have safety teams. But at the same time, it looks like they're trying to be as fast as possible at getting to more capable systems. So that's a bit of a conflicting position for me. I think places which are not explicitly developing more capable systems, but looking in specifically safety theory, places like Machine Intelligence Research Institute. To me, that's probably the best way to approach safety. They don't even publish certain results, or many results for that matter. They are more concerned about getting results than spreading it or getting others to cite them.

[00:22:42] RS: How familiar are you with the structure of those safety teams?

[00:22:45] RY: In terms of who's on them?

[00:22:47] RS: Or just in terms of how they do their work?

[00:22:49] RY: I mean, I have tried their own knowledge. I never worked for one of them.

[00:22:52] RS: What's your sense of how they're lending themselves to these projects and with the level of oversight that they're providing?

[00:22:57] RY: I think they get early access, but I don't think they are controlling what will be finally done or not done. I don't think they have that veto power.

[00:23:07] RS: If you were on one of those safety teams, what kind of questions would you be asking through the development process?

[00:23:13] RY: I would always try to compare our progress in developing versus our progress in developing safety mechanisms. At least from outside, it looks like there is exponential progress and capability and barely linear progress in safety work, which of course, long-term is not sustainable.

[00:23:29] RS: Right, right. How would you measure progress and safety work?

[00:23:32] RY: Well, it is super difficult. I mean, we're still struggling in terms of measuring capability, IQ tests, and such very rudimentary tools. But I think if I asked someone to point to like big breakthroughs in AI development that easily give me like 10 latest breakthroughs this week, we had this and that. Whereas in safety, I think people would struggle with big breakthroughs. They might point to like, “Oh, that's when we recognized safety was important. This is when we recognized it was really hard. Maybe impossible. But I don't think anyone would be like, “Those are the top 10 amazing breakthroughs in safety.”

[00:24:08] RS: Right. They're not as headline-grabby, unfortunately.

[00:24:11] RY: Or they don't exist.

[00:24:12] RS: Or they don’t exist. That's exactly right.

Well, Doctor, we are creeping up on optimal podcast length here. And I just want to thank you for being here. This has been fascinating listening to you. And please keep up your research, because it feels crucially important. And maybe we can speak again in the future. And you'll have lots more problems, I'm sure. But maybe we can get into some solutions as well.

[00:24:32] RY: That sounds wonderful. I hope to come back with a solution.

[00:24:39] RS: How AI happens is brought to you by Sama. Sama provides accurate data for ambitious AI, specializing in image, video and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, ecommerce, media, med tech, robotics and agriculture. For more information, head to sama.com.