How AI Happens

Evolutionary Programming with Dr. Bill Porto

Episode Summary

Joining us today is Dr. Bill Porto, Redpoint Senior Analytics Engineer and storied AI researcher, academic, and developer. Bill shares all his current projects, including pattern recognition and optimization models, and he reveals what it was like to work with the father of Evolutionary Programming, Dr. Larry Fogel. We touch on a new definition for computational intelligence, and talk about where evolutionary programming is in use today, before exploring the fact that evolution is not simply survival of the fittest, but increases variance through retaining less perfect fits. What's more, we define evolution as adaptation in a dynamic environment.

Episode Notes

Key Points From This Episode:

Be introduced to today’s guest, Bill Porto, Redpoint Senior Analytics Engineer.
How he entered the industry, his background in applied math, and how he ended up in his current role.
The subjects he is working on now: pattern recognition and optimization models, personalized recommendation systems and business process optimization.
What it was like to work with Larry Fogel, a polymath in the true sense of the word.
How computational intelligence is just taking cues from nature.
Where evolutionary programming is in use today: commercial and government organizations, transport, the pharmaceutical industry, and more.
Why evolution is not really survival of the fittest, but increases variance by retaining more solutions.
How evolutionary processes require noise and how we should control what kind of noise it accesses.
What evolution is all about: adaptation in a dynamic environment.
Why having solutions that are medium fits can help you find exactly the right one.
How there is no single algorithm for all optimization problems.
Why, if you spend a lot of time getting a perfect solution, it may be stale by the time you implement it.
How important it is to prioritize customer satisfaction and optimize human resources
Why considering different goals and attaching different weights to them is so important.
Why a hybrid approach is good engineering practice as is using the best tool for the job.
How customer acquisition is not the same thing as customer retention.
Non-discrete, asymmetric bowl functions as a way to create solutions.
Scalability as a feature of the current landscape that enables us to tackle large problems.
Why continual learning is such a powerful approach.

Tweetables:

“Computational intelligence is just taking cues from nature. And nature adaptively learns using iterative evaluation selection. So why not put that into an application on a computer?” — Bill Porto [0:04:18]

“It’s not really survival of the fittest, that’s the common moniker for it, in reality evolution favors the solutions that are most fit, but it tends to retain a number of less fit solutions, and one of the benefits of that is it increases the variance in the number of solutions.” — Bill Porto [0:07:20]

“If you spend a lot of time getting a perfect solution, by the time you have it, it very well may be stale.” — Bill Porto [0:15:17]

Links Mentioned in Today’s Episode:

Redpoint Global

Bill Porto on LinkedIn

Episode Transcription

0:00:00.0 Dr. Bill Porto: Computational Intelligence, I should say, is just taking cues from nature, and nature adaptively learns using iterative variation, evaluation selection. So why not put that into an application area on a computer?

[music]

0:00:14.6 Rob Stevenson: Welcome to How AI Happens, a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson, and we are about to learn how AI happens.

0:00:45.0 RS: Evolutionary programming is written all over a host of AI applications. Transport optimization, image classification, really any area where there are more variables to consider than a typical algorithm might comfortably weigh. If you've ever worked with evolutionary programming or done even the most high level review of the literature, you've come across the work of Dr. Larry Fogel. Larry earned his title as "The father of evolutionary programming" through his work pioneering this field as far back as the 1960s.

0:01:18.6 RS: Today on How AI Happens, we're going as close to the invention of evolutionary programming as we can get. Dr. Bill Porto worked directly with Larry for years, developing a long-term friendship as they worked together inventing and refining evolutionary computational approaches. Bill joined How AI Happens to share his experiences, to give examples of modern applications utilizing evolutionary programming, and explain what you should keep in mind when you work in evolutionary approach into your own development.

0:01:49.0 DP: I'm a mathematician by training, and schooled... I studied like number theory, numerical analysis, optimization, stochastic process, that kind of stuff. I would guess I was just good at it and found it kind of stimulating, which is nice. I got really interested in applied math, in other words, engineering applications of this. So this kind of places where you can use the math in the real world, instead of just playing around with it in your mind. After grad school, I started working in a small company in La Hoya. We did a lot of signal and image processing for government agencies. And at that place, the learning really never stopped. There are lots of high risk but high gain research being done there, which is a lot of fun.

0:02:21.6 DP: In terms of the evolutionary computation, while I was there, I was exceptionally fortunate to meet a guy, Dr. Larry Fogel. And he was an early pioneer in evolutionary computation. He become a mentor and a very good friend of mine. He's a brilliant mind, extremely talented person, very inventive. And so, we worked together with him and his son, David, another PhD, on a lot of problems, and these are places where the typical techniques were not really apropos. If you used a regular technique, you couldn't really solve the problem, so we had to think outside the box. And actually, that's where one of the ideas for evolving neural networks originated, basically optimizing the topology and node functions, and even the training of the weights using evolutionary methods.

0:03:00.3 DP: So a few years later, Larry, David and myself, we all left to form Natural Selection, and that's a place where we could focus on applications of evolutionary optimization. So we work on everything from optimization of bus schedules and clothing plant operations, to maybe even a fully autonomous vehicle control and pharmaceutical design. It was a good time. So currently, I'm working in the development of pattern recognition and optimizations algorithms for Redpoint Global. And we're trying to improve the customer experiences, basically, modeling and personalized recommendation systems, business process optimization and the like. It's a new very intriguing application area for me.

0:03:35.8 RS: Yeah, we'll definitely get into some of the subject you're working on now in a little bit, but just... Larry Fogel was a legend. And when we start reading about the stuff he's worked on, the paper he's published, he was developing AI technologies surely before it was called AI, but it definitely predated my conception of how old this industry was. Is that fair to say?

0:03:55.7 DP: Absolutely, yeah. He started out, I believe, in the late '50s and early '60s, when you had computers that were the size of a room. So imagine doing some research on some of these things that you need days, if not weeks of computational time just to come up with a simplistic answer. And so, he was one of the very early pioneers, and he set out and the trailblazed this area for much of the rest of us.

0:04:16.4 RS: What was he like? Was he kind of a mad genius? Was he a regular guy? How would you describe working with him?

0:04:21.5 DP: He was a polymath in the very best sense of the word. He could do just about anything. He's definitely an outside of the box thinker. He's a brilliant mind, and not only that, he was a great person. He played flute. He played saxophone. He played harpsichord. He was a regular guy you could have a conversation with on virtually any subject. I really miss him.

0:04:40.4 RS: Yeah, yeah. It sounds like a sweet man. Rest in peace, Dr. Larry Fogel. And do you think in those early days in the '50s and '60s, or a little bit later when you joined, do you think the work at that point was leading to what we have now? Do you think that there was a conception that this was the end goal or that this was where this was all going? What do you think was the long-term vision when you were just starting out with the early stages of evolutionary programming?

0:05:04.0 DP: I'm not sure there was actually an end goal per se, other than just, "Here's a new avenue that we could explore." It's a new way to solve problems. Again, evolution is basically taking computational intelligence, I should say, is just taking cues from nature, and nature adaptively learns using iterative variation, evaluation selection, so why not put that into an application area on a computer. And computers, now that we got faster ones, just makes everything faster. And so, we can do things that we couldn't do before, and problems that could not really be solved even in the early days of evolution and computation, now they can because we have the memory, we have the CPUs, and we have the capabilities.

0:05:42.1 RS: Yes, definitely. I wanted to ask you that. You kinda teed me up nicely for this follow-up. Where is evolutionary programming still in use today, and what are the more bleeding edge applications of this?

0:05:52.1 DP: Well, let's see. From my perspective, again, it's used in quite a few different fields both in commercial and government applications. Caltrans, for example, they're a local transportation department here in California, it's a government agency that works on freeways. They actually use it here locally to optimize the on-ramp single timing. So when you're sitting at traffic and the little signal light says, "Okay, get on the freeway, don't get on the freeway," etcetera. There is an optimal way to do that based on the amount of traffic. And so, that's one of the ways that they're using it.

0:06:20.7 DP: There are a lot of other government agencies who are using evolutionary computation, and this range everything from, say, dynamically optimizing the autonomous vehicle behaviors, to maybe port security. That, for example, is determining what... If you got a lot of containers coming into a port and you have very few inspectors, so how do you best assign which inspector to take a look at which container based on probabilities and optimal use of those resources. So that's another one. Pharmaceutical industry, definitely a big use. And that's where you could actually do drug design. That's an area that I've been involved with in the past as well, a natural selection. But other things, for example, robotic controllers, that's a big process of controlling thing for evolution and computation, scheduling. Anything that really applies to NP complete problems, things that were typically not capable of or suitable to typical traditional optimization mechanisms. Even, for example, your cellphone, the antenna design that goes into your cell phone, that's another area where evolution and computation is used. So it's got a lot of different applications that, again, any problem that requires a different or timely adaptive optimization in a dynamic environment, it's perfectly well suited.

0:07:36.9 RS: When one calls to mind evolution, survival of the fittest, I think it's natural to think of a T-Rex or perhaps a bipedal ape with a large frontal cortex. Is it so much about finding the one perfect iteration when you consider the vast data sets, for example, that are being processed by an evolutionary programming approach, is it more about finding the most fit? Or is it about eliminating the things that are not relevant and kind of narrowing things down to a solution that way?

0:08:08.0 DP: It's actually the latter. Again, everyone tries to think about evolution in different ways, but it's not really a survival of the fittest, that's the common moniker for it. But in reality, evolution favors the solutions that are most fit, but it tends to retain a number of less fit solutions, let's put it that way. And one of the benefits of that is it increases the variance in the population of solutions, and that in turn helps in adaptation, especially in dynamic environments. If you think about it in terms of the survival of the fittest, if you eliminate all but the fittest, then evolution typically stagnates, and it may never adapt or find a global optimum. And so, sometimes by introducing a completely random solution in the mix, it stirs up the pot enough to get out of that local minimum, so this way... And by the way, that's called the headless chicken solution, just as a moniker for that. This is kind of odd, but it stuck. But either way, it adds variance, and that variance allows you to search much more efficiently and effectively. And if you think about it, in engineering and most scientist, this kind of noise, this variance is typically thought of bad, you wanna get rid of it. And in fact, they do everything they can to get rid of and reduce noise.

0:09:18.1 DP: But in evolutionary mechanisms, these are stochastic optimization processes, they require noise to search the space. And so it's all about controlling the level, the types of noise to make small jumps, medium jumps, large jumps to find that optima. And this is where you want the variance, because this is a technique where actually noise is your friend, especially in the dynamic environments.

0:09:38.5 RS: It has a certain poetry to it. I like how earlier you mentioned that these processes can parallel natural processes, particularly in the case of evolutionary program, and obviously that one's pretty cut and dried. But also, the introduction of variance to shake things up, as with a human who suddenly introduces exercise into their routine, so too with an algorithm perhaps. Could you give an example of the headless chicken solution?

0:10:01.8 DP: Certainly. Again, if you get too close to a, say, sub-optimal, a lot of these real life error surfaces and you're trying to search for the optimum, you get a lot of little pockets that's sub-optimum. And if you get too close to that sub-optima and don't have these big steps and little steps and medium steps, say if you only have the small steps, you're taking little itty-bitty jumps around that little point, you're never gonna escape that. So sometimes a very large change can make the big difference in getting out of that little environment. I'm trying to think of good things... If you look at biology, most of the changes that you have, say at a genetic level, they either do nothing or they're deleterious. But sometimes, enough change happens to throw you into a completely different level, all of a sudden instead of having three legs, now all of a sudden four legs, and you can explore the world a lot better. That type of a large jump is something that you want. And that's where this headless chicken solution, it's like throwing a dart at a dart board and you had a bunch of darts around one area, and great. Throw another dart, and it just happens to land outside. If it works, great. If it doesn't, it gets thrown out and it's no big deal. But it allows that search to be much more effective and efficient.

0:11:09.1 RS: That is actually how biological evolution takes place too. We have this conception that it happens fractions of a percent at a time, a very slow and gradual increase towards optimization. But historically, at least in terms of organic life, typically it looks like a very slow gradual incline, but then a catastrophic change in environment like a volcano erupting or a climate change followed by whatever was most exposed to that change shooting up. And now, it's not just 1% more disposed to the environment, it's 190% more disposed. It's like having the power of flight was advantageous until the ground was covered with lava, and that's hugely advantageous. And so that's how we see biological evolution take place. And it sounds like you mirror that with evolutionary programming, right?

0:12:03.2 DP: Exactly. I think in the sense of biology, they call it punctuated equilibria, where you have large times of pretty much stasis. And then one small change in either an environment or you have a mutation or some variation, and you then find it much, much better in those types of species then overtake the other ones, and that's what evolution is all about. It's adaptation in a dynamic environment.

0:12:27.6 RS: Yes, yep, definitely. So going back to this notion of whether it's finding survival of the fittest or elimination of the weakest, can you share how elimination of the weakest, chopping off the least disposed elements overall will improve algorithms?

0:12:43.0 DP: Well, what that comes down to is that you don't have infinite resources. And when you think about it, even on a computer, that's just great, you got a lot of memory. But at some point you're gonna run out of gigabytes or terabytes or petabytes or what have you, to store all the different solutions. And so you gotta get rid of something at that point. And those that are least fit, those things that just they're not good or good enough get tossed by the wayside. What's interesting, if you have the mechanism there, the variation selection mechanism, it's very possible that you could come up with the same solution again. Which means if, for example, you had a dynamically changing environment and today you have a lot of, say, wet weather and that lasts for years and years and years, and you have things that adapt to that very well, then later on when all of a sudden the weather turns dryer, guess what? Some solution that was prior to that wet might actually be something that you could find again.

0:13:33.4 DP: And this is the same thing with the simulated evolution in computers is that, this is where you have some solutions that are at least fit, they're or kind of medium fit if you wanna call it that, that are just hanging around, they're not doing anything bad, but they're not always in the top part of the population. But those can provide those darts at the dart board that allow you to then search around those. And guess what, you might find a better solution based on that just by throwing the dice.

0:13:57.1 RS: I see. So you can even just iterate on the top of the bell curve kind of, and like, "Hey, I'm proving this a little bit," means the entire thing shifts forward.

0:14:05.7 DP: Exactly. And if you think about it in terms of a distribution or probability function, you got long tails out there. And so if you throw most of the darts around the very center of that, you're gonna see things that are very close to that center, and you're gonna see responses close to that center. But every so often you throw a dart, and that dart ends up going way, way off, and guess what? Maybe that's a better solution for you. And if it is, great. If not, hey throw it out. That's okay. Literally, it's experimental design.

0:14:35.9 RS: So if I'm out there, if I'm listening to this for example, and I'm developing algorithms, tweaking algorithms, what is my best course of action to implement this kind of approach into my development cycle?

0:14:46.3 DP: Well, first of all, I should bring up there is no single one approach that is the best for all of these. And there were two people very, very well-respected researchers, Dave Wolpert and Bill Macready, and in the early '90s I believe it was, they proved that there is no one best algorithm for all optimization problems. So this is one concept to keep in mind. And this is an important result that I think a lot of people who've said, "Well, my method is better than everyone else's." I'm only gonna do this, all the assumptions that are made and all these different types of... Whether it's evolutionary computation or not, it's irrelevant. But that the idea is to keep an open mind and try to find different techniques that work best for certain things, and other techniques will work better for others.

0:15:31.0 DP: And so, some of the things that you want to think about, again, for example with the evolution and computation, scoring functions, what do you want to do and what is your end goal? That's really one of the largest things to get over. And if you get the right solution to the wrong problem, it's pretty much worthless. So that's one of the first things you have to do. Another thing to think about also is, you've gotta find a good solution in time to be useful. So if you spend a lot of time getting the perfect solution, by the time you have it, it very well may be stale or the data's stale, and so a good solution in time to use it is much better than something that's in practice than a perfect solution, but it's so old that time has past you by.

0:16:18.6 RS: So when you are trying to hone in your scoring function, what are the questions someone can ask themselves to make sure that they are drawing a circle around the problem, that they're asking the right questions of their technology to provide a relevant outcome?

0:16:32.2 DP: I think probably the key here is to look at the big picture, what are you trying to accomplish? And that's where a lot of people get confused. They would think, "Okay, well, we're just going to maximize my profit today." And that's fine, maybe that is their goal and that is the end goal. But maybe there are other things that also come into play. For example, in the marketing space, customer satisfaction is a big thing. So you maximize your profits today, but if you alienate your customers, they're not gonna be happy, they're probably not gonna come back. You got a dollar from them, but are you gonna get a dollar from them tomorrow and the day after? Or are they gonna be good customers? So that's one of the things. So we have customer satisfaction, we have maximize profits. Maybe you wanna minimize the use of resources, maybe you want to use the human resources that you have most efficiently, or maybe manufacturing resources most efficiently at the same time while getting your throughput, the number of products out the door to be maximized. So there are a lot of different goals, and I think the idea here would be to think about it as a whole, a holistic thing, what are you trying to accomplish? And then maybe you have different weights for each one of those. "Yes, we would like to have this, but it's not as important as that." And then when you add them all together, it's a combination of minimizing and maximizing, so it's a resource allocation problem.

0:17:42.3 RS: A moment ago you mentioned that there can be no perfect sort of solution that, of course, as data continues coming in, perhaps the solution you came up with will be outdated by the time you get it out the door. How would you say this jives with the likelihood or possibility of a so-called master algorithm?

0:18:00.0 DP: Well, this goes back to the work that Dave Wolpert and Bill Macready basically said, "There is no one perfect algorithm for all optimization problems." This is what colloquially is called "The no free lunch theorem," or the NFL theorem. And again, at the time when they were doing this research, there were a lot of different competing mechanisms. There were the top-down approach people, like evolutionary programming, evolutioning systems, ant colony, all those things. Those work very well for certain problem domains, and then there were the bottom-up people, the genetic programming and the genetic algorithms, the bit level representations. And those worked very well for other problem domains.

0:18:37.3 DP: Just as in the side, I personally like a hybrid approach where you steal the best from both, that's good engineering practice. But it all comes down to using the right tool for the job. So if you have an error surface that looks simple, obviously you use a traditional method like Newton-Gauss or stuff like that, you get there in pretty much one step. For real world problems though, with lots of optima and dynamic landscapes, it's really hard to beat these evolutionary mechanisms. I should also say that one of the largest bonuses that I see in evolutionary mechanisms, it's the ability to evolve and optimize for asymmetric scoring functions, and this is pretty much true for all of these. This is something that doesn't work well with the traditional mechanisms. [0:19:19.0] ____ traditional use, different errors really have different costs.

0:19:22.2 DP: If you try to think about this, maybe you're trying to predict the weather. If you predict rain, but the sun shines, it's no big deal. You haven't really lost much. So you took a rain jacket, big deal. If you predict a sunshine and a hurricane occurs, that can be catastrophic. So they do have different costs. For example, in the marketing space, here's a typical one, the cost of customer acquisition is definitely not the same thing as customer retention. So you have to think about that with respect to what you're doing, and that's where you're gonna come up with and choose what type of algorithm and the representation that goes under that. This is one of the beautiful parts about evolution computations, is that you can do these non-discrete asymmetric goal functions. And if you can formulate it well, you can come up with some superb solutions that just can't be done otherwise.

0:20:08.6 RS: What do you mean when you say, "The representation beneath it?"

0:20:11.3 DP: Well, the representation... There are two different approaches in evolution and computation. One is the bottom-up, and that's where they have the bit level. If you think about it in terms of genetics, you change a single nucleotide, and you get a different organism or a different protein or etcetera. So that's working at the low-level bit level. The top-down is the representation might be a structure, a data structure where you have an arm and a leg, each arm and a leg and a liver, and all the different parts of the solution. If you're doing neural networks, for example, it might be that the top level would be, here's the topology. Do you have feedback, feet forward? What links go to what things, the nodes, etcetera. But not at the individual bit level. We're not gonna to just change that solution, turn on this bit, turn off that bit, or even maybe try to combine the left half of this neural network with the right half of a different one. So that's where the representation I think is very critical to coming up with something that works for a specific problem. Obviously, some were better in others for certain problem spaces, and this is very well documented in terms of what you can do and what can't really be done effectively with different representations.

0:21:18.6 RS: How does one decide?

0:21:20.3 DP: A lot of it is from experience, unfortunately, and from reading a lot of the literature. That's a nice thing about what the literature it has, because lately it turns out, at least in the last maybe 20 years, it's gone away from the "My particular mechanism is better than your particular mechanism," or what have you. And it's gone towards, "How can we make the differences and the best parts of each work?" And so there's a lot of literature out there now in terms of application areas, and you can then get a very quick sense of, "Hey, let's try this, because it's been very well proven to be pretty effective if we use this particular technique, and we can always adapt it if you change this or change something else, maybe the configuration or some of the parametrization, etcetera. But there's a plethora of different application areas that people have been working on at least for the last 20 years, if not 30 now, that are easily available. And in terms of if you're gonna apply this, they're definitely the best place to start.

0:22:12.2 RS: Bill, this has been fascinating learning from you. We are creeping up on optimal podcast length here. But before I let you go, I would just love to know, for you, someone who has been working in the space for a while, who is just endlessly curious. When you take in all of the work being done across the industry, when you read through these research papers, what to you is the most exciting development in AI right now? What just kinda stokes your creativity and curiosity about what's being developed in the space?

0:22:39.3 DP: One of the things I particularly like to see is the scalability of this. Again, think about it back in the '60s and you had very slow computers and you couldn't do much with them. And then starting in the '80s, you had faster computers. Now we have GPUs and all the computational researchers that we want. We can then do a lot larger problems, and we can tackle these things and get solutions very, very quickly. So I think that scalability is huge. And this is where all the distributed processing comes in where you can have ILAN models. You have all the different cloud services that allow you to do that, crowdsourcing of different types of computational resources, that's fantastic. The other thing too, and this hasn't been really focused on too much, but in application areas, continual learning is always a good thing to do. So for example, at least what I'm doing right now with modeling, people in the past have said, "We'll make a model and we'll use it for six months, and we'll predict based on that model." But you now have the resources to be able to adapt and change that model every day and get something that is better and better, even if the data stays the same, you can get better and better, and just replace the old with the new. Or if the data changes, it gets a new model, whatever's adaptive, and I think that's fantastic.

[music]

0:23:57.9 Speaker 3: How AI Happens is brought to you by Sama. Sama provides accurate data for ambitious AI, specializing in image, video, and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, E-commerce, media, med tech, robotics and agriculture. For more information, head to sama.com.