How AI Happens

Anna Susmelj: Latent Space, Causality, and Computational Biology

Episode Summary

Anna Susmelj explains her research at Facebook AI developing optimal drug combinations for the treatment of complex diseases, as well as her background in causality research.

Episode Notes

Anna Susmelj explains her research at Facebook AI developing optimal drug combinations for the treatment of complex diseases, as well as her background in causality research.

Anna's Facebook Research: AI predicts effective drug combinations to fight complex diseases faster

Episode Transcription

0:00:00.0 Anna Susmelj: If you control, your network was not learning on just correlated features but would learn on the causal features, then, no matter in which environment you would be doing this, your network would perform well.

[music]

0:00:11.7 Rob Stevenson: Welcome to How AI Happens, a podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson, and we're about to learn how AI happens. So far on this show, we featured individuals with all kinds of backgrounds. Mathematicians, physicists, software engineers, data scientists. While there are, of course, prerequisites, it seems like there's no wrong way to end up in AI. I want to document some of the paths curious, passionate and dedicated individuals take to this field, perhaps in an effort to illustrate the myriad ways an expert can wind up contributing to hugely influential AI technologies.

0:01:22.2 RS: For this episode, I sat down with mathematician, computational biologist and machine learning research expert, Anna Susmelj. Anna's command of the approaches one can take in machine learning research is truly exceptional and fun to watch in motion. This is our most technical episode to date. And if you're working on anything involving probability theory, dimensionality or latent space, Anna's probably about to solve one of your problems in an off-handed comment that could have been its own entire episode. Anna joined me to discuss her varied background, and to do a deep dive on research she conducted on behalf of Facebook AI.

0:01:55.4 AS: So, I started... I studied in Russia, I'm originally from Russia, and it all started when I... Actually, many years ago, when I was 15, entered a special school in Moscow, which was focused on mathematics. It was called Kolmogorov's School. It's kind of high school, but it was a Special Education, and this is when I first learned about weird geometries, non-euclidean geometries, and their whole curriculum was super weird. After that, I was so determined to study math and computer science, so I joined our one of the top Russian universities, it's called Lomonosov Moscow State University. And then I started computational mathematics and cybernetics, and was really in love with math at that point. And my studies were in mathematical statistics, which is pretty theoretical in Russia it was like... Mathematical statistics and probability theory.

0:02:45.5 AS: After five years of my studies, I decided that I'm totally fed up with mathematics and I don't want to do it anymore, and I wanted to do something applied. And I thought a bit, and then I applied for a programme in Switzerland for life sciences. I had zero knowledge of biology at that point, but I decided that I want to apply my knowledge to biology and medicine, and that's how I started a PhD in Computational Biology Group in Zürich. Five years, then, in PhD, where I learned a lot of biology. After that, I decided that I had a lot of math and applied math and biology, and now I want to do something else. And then I started to do... I joined Facebook AI first as an intern and then as a postdoctoral researcher. And since then, I basically have been doing machine learning.

0:03:39.1 RS: I'm curious though, when you began your PhD, you're fed up with math, you want to kinda mix things up a little bit, how did your background apply to biology? How are you able to bring what you already knew into this new field?

0:03:50.4 AS: That's an excellent question. So as a computational biologist, I didn't have to know any specific knowledge in biology because I was working in a collaboration with an actual wet lab biologist. And this biologist, she was giving me the papers to read, explaining to me the problem, and I was looking at the data and then trying to understand it, but from a mathematical point of view. And from a mathematical point of view, I was supposed to design algorithms trying to help her to understand what's going on with her data and try to come up with the conclusions from the data sets that she had. It's pretty common applied math or computer science type of research, except that instead of having a more standard data sets, we have actual, real data sets. Quite often, we're supposed to answer a new problem. So you're not trying to achieve the best numbers on a particular test data sets, but rather aiming for journals like Nature Methods and similar journals where you're trying to answer biological questions, which was not answered before. So you quite often don't have benchmarks.

0:05:01.1 RS: Right, right. It strikes me, given your background, you might have ended up in any number of fields. Was machine learning a natural progression? Was there something particular about this career and that this focus of research that really fascinated you, or was it just sort of like a natural combination of your background?

0:05:20.2 AS: Yeah, it was kind of a natural transition because a lot of things in machine learning are actually based on probability theory, or at least, I feel that knowledge of probability theory helps me a lot. That's driving my research. But... Yeah. And also, machine learning gained a lot of popularity in computational biology, and it makes sense because these methods are quite powerful and they're very useful. So I felt like this... They gave me instrument that I didn't have before, which I could not achieve with conventional statistics, which I was more familiar before. And I found that fascinating, how much more interesting things I can do, knowing machine learning.

0:06:04.6 RS: Can you explain why your knowledge of probability theory was such a useful building block?

0:06:08.9 AS: A lot of theory in machine learning is actually based on probability theory, right? If you think of distribution of the labels, if you're thinking of supervised or unsupervised problems, it's all formulated in terms of probability theory quite often in terms of their... Maybe more simpler rules, not necessary, you don't go into very deep understanding of the measure theory, but still, it's all the matter of playing with the probabilities. And they... Then when you're trying to transition to... Another type of methods which are more unsupervised to self-supervised, it's again a lot about the probability of the area. If you think of variational autoencoders or another Bayesian approach is... It's all based on the probability of the area.

0:06:49.9 RS: So I wanna fast-forward a little bit to your research at Facebook. There's a couple of really awesome papers I wanted to get into. The first of which is this research surrounding the prediction of effective drug combinations, and this was sort of tangentially related to biology. Maybe, perhaps more chemistry. But I'm just curious, could you maybe give us a high-level overview of what the research was surrounding and kind of the approach there?

0:07:14.3 AS: Actually, it is related to biology 100% and not even chemistry because we are not using the composition of the drug. So we're treating drugs as the items in a dictionary. Yeah, but the whole goal of this research is when you're given a data set and you have observations for effects on a single cell level, single cell measure... Single-cell level means that you have measurements of the gene expression for each individual cell in your body, so they were measured. Also just to say that in terms of biology, that is pretty new thing. It's like this technology appeared around 2015, so it's quite young, but it's getting a lot of popularity and it allows researchers to study much more compared to what they could do with so-called bulk sequencing, when they were taking a sample and just getting an average in the sample. But I won't probably go into the details of the biology, but the whole idea is like, you have these measurements which look as a feature matrix, and then you have additional labels for which drugs you measure then.

0:08:17.8 AS: So you're observing the effects of individual drugs or drugs in the combinations, and what you want to predict is actually effect of the drugs that you didn't see that during training. Why is it useful? It is useful for experiment planning because not always drug combination is an effective one. Especially if you think of a complex disease such as cancer, a lot of treatments are combinatorial treatments, so you're getting two or three drugs at a time, and selection of the effect of treatment is actually a very important and open problem. I would think it of a research question, so you would be able to use the method to predict these effects and design your experiment, or sometime in the future when this technology will become more popular in the clinics and it's getting there, you will be able, given the patient data and the measurements, try to predict what would be the outcome of this drug combinations on this particular patient. So this is kind of a step towards their personalised medicine.

0:09:18.0 RS: Yes, and the complexity of this problem is part of why pharmacists spend so long in school, because they have to be able to try and manually predict what is the effect of all of these different drugs on an individual who has a complex disease like cancer, for example. What was the approach here, how are you weighing all the factors and coming out with an output that would tell you, what would be the result of all of these drugs combined?

0:09:43.2 AS: Yeah, so we are taking the so-called compositional approach. When you're trying to predict effects of the drug combinations, one way to go is to see it in a compositional fashion. You have a data set where you observed some... Maybe some individual drugs, maybe drugs in the combinations. What you want to see this is, you have each drug represented in a vector in a latent space. And it was the latent space arithmetics, you can learn this vector for each individual drug from your data set, and then when you will be predicting, you will follow the arithmetics of this latent space, and say, "If I have two drugs, A and B, which I saw during the training, but never together, I will combine them in my latent space and decode. And if you impose this geometry of the latent space on the whole data set, it is very likely that the model... You are learning the right combination of the drugs.

0:10:38.0 AS: You can think of it's a... Even in a simpler model. If you have a human, which is that you have a picture of a human which has a hat, and glasses, and a scarf. And then observe in this... These pictures of the humans, and some of them have glasses, some of them have a hat, some of them have scarves, some of them have both. And then you want to have yet another human which has nothing, and you want to see how this human would be with all of this. That's exactly what the method would do, it would put their... All the parts of their clothes on this particular human that you didn't see before in this combination. And this is done just through the... This latent space arithmetics and the assumption that the hat is not related to the scarf, and not related to the glasses. And you're actually able to learn it in the latent space.

0:11:22.7 RS: Can you explain what you mean by the latent space?

0:11:24.0 AS: In the latent space, I mean when I'm talking about their conventional models such as autoencoders, you have a... One neural network which projects your data into the... Some latent services to their space with lower dimensionality, and you have... So basically you have a neural network with several layers, so it's a non-linearity, which bringing your original sample to this reduced space, and from this reduced space, you have a decoder, which is again a neural network with several layers usually. It doesn't have to be. And then you decode it back. So you kind of compress in your space. If you think of it, the simplest model of it... Or analogy to this would be a PCA. If you know how their principal components work, it's also your learning... Except that in autoencoders it doesn't have to be orthogonal, but you're learning a lower dimensional representation of your data. So basically something in much smaller dimensions that you would originally have. In this dimension it's easier for you to reason, and to drive conclusions and maybe sample from this dimension so if you're talking about variational autoencoders, or in our case, we are also applying the arithmetics, to be able to decode something new that we did not see during our training time.

0:12:45.9 RS: When you exercise dimensionality and you work with a much more compressed view of the data, is there any risk of losing accuracy?

0:12:52.8 AS: It depends on the intrinsic dimensionality of your data. The thing is that you can't know this in advance, but you can measure it by the reconstruction laws. It could be that... It really depends. It's the same in a way as a PCA. How many components do you need? If your distribution is over-parameterised, so you have too many collinear features, you don't need all of them, because all the rest of the features you could probably reconstruct from the current features. Also, it depends what you are calling by the precision, so the reconstruction error or fit. It's also the idea for GANs, right? The GANs are also based on the idea of latent representation from which you are sampling and you're getting your image. It's the same thing. You saw probably the impressive results of like NVIDIA GAN, and they look super realistic. So are you losing in terms of their precision? It's hard to say. Maybe on some samples, yes? But overall, you're getting the task done. So that's... It depends on the definition, so...

0:13:52.3 RS: Were you then practicing all these methods throughout the research to ensure that the dimensionality wasn't resulting in a loss of accuracy, like you're also running these sort of calculations on the side?

0:14:01.2 AS: Yes, of course. So when we are assessing the performance of the method we are looking at... First, we look at how well does the model performs on the reconstruction of the sample that we already saw. So it's a simple reconstruction error. But this is kind of a must have for any autoencoder model. And then we're looking also how well our model is performing on the out-of-distribution case. So basically, how well can we predict to the drug interaction which we have never seen. So like the drug combinations, which were never seen during the training time. And then we also look in how well do we disentangle our space. So how well do we separate information about the drugs from information about the cells, but this is just related to the outcome dimensionality assumption, and this is kind of an essential beat for the method to perform so well, on the out-of-distribution case.

0:14:48.9 RS: There's this example of like, there's all these unknown combinations, right? That your... That the model needs to calculate, how are they able to come up with insight when there's no, maybe data for it or background for like, what this would look like, is that a consideration?

0:15:03.1 AS: Yes, of course, you can have infinite amount of combinations, also, you would have only limited amount of ability to predict them well. So to measure the performance of a method, we just split the data from the very beginning. We separate the drugs, which we never let... So never touch the combinations that we will just use for our testing for out-of-distribution case, we make sure that this is the drug combinations which are interesting, that are given really like some unusual effects, which we would like to be able to model. And then we just separate them, and then we look at them later, trying to assess the performance of the method. But generally speaking, if you're talking about all possible direct combinations, of course, we don't necessarily have them in the data set. So for this, we would need a... Like a validation of the method afterwards with the additional experiments, which we will also try to perform.

0:15:54.3 RS: What is the output then, what is the yield of this insight, how is it presented?

0:15:58.4 AS: So what you're getting from the method is the predictions for the drugs that you have never seen. And we show that you can do it pretty accurately. So the precision is quite high. We can also show how you can guide your experiments through this. And yeah, later we will try to validate this experimentally, also then define the exact biological findings that we see. But this is kind of follow up experiments. The performance of the method is already demonstrated, just through this out-of-distribution cases.

0:16:28.7 RS: The insight is like, here's the biological impact, here are symptoms. Here's what this combination of drugs results in in terms of someone's health, what are you looking at when you sort of predict the effect of the drug combinations?

0:16:41.2 AS: Yeah, so when we are looking, we're looking at the samples of the cells. And when we are able to predict we predict in the gene expression. So basically, how many counts of this particular gene do you have in each particular cell? So it won't have a direct outcome on a particular patient, but it will tell you how the cells will respond to this particular drug or drug combinations. Which is important because not all the cells are responding at all to the drug combinations. We were looking at some, for example, cancer samples. For the cancer samples, you probably want to also to use this model to assess how likely the cells will die given these drugs and how likely the other cell types will survive given these drugs. So this is the type of conclusions that you will be able to perform using this method. Yeah, we are showing that we can predict these effects quite robustly.

0:17:38.9 RS: Something you don't know about Anna yet is her early work on causality. And while it's now a very widespread and trendy sub-field of machine learning, well, not to be too much of a hipster about it. But Anna's work in biology was heavily focused on causality before it was cool. I wanted to hear more about her experience with this sub-field. And her outlook on where it is today.

0:18:00.7 AS: I was doing... Because I literally research of... Essentially for all my PhD, because it's one of them... Like ultimate questions in biology. Everyone wants to know causality and wants to know how genes are affecting each other, or how proteins are affecting to each other. And actually how to discover this causality from data. While I was working on it, I was studying the problem of fractional killing in apoptosis. So just to explain it in the simpler words, there was an experiment showing that cancer cells, which have exactly the same genetic origins, they're genetic clones, when they're exposed to the drug, which is supposed to kill them they start to die. And somehow, they die at a very different speed, and some cells don't die at all. So this is linked to the problem of cancer resistance. So why some cells, cancer tumors, don't necessarily respond well to the treatments? And for this, we wanted to understand which protein is affecting which one because if we know this, we can design that drug treatment, which would particularly target the problematic protein, and that would be yet another opportunity for the treatments. This is linked to a field which is called causality.

0:19:10.3 AS: It's, I think, mainly arising from statistics. In ETH, for example, this is their statistics department, there is a whole group working on causality. It's related to Bayesian networks. And these are called like one of... There are many formalisms for the theory, and I was working particularly in the prediction of causal Bayesian networks and... But this is a more statistical view on causality. And when I joined Facebook, I was part of a group which was working on another type of view on causality, which is called scene causality-in-variances. And it kind of thinks, oh, causality that's biology, I clearly understand what is occurring 'cause the proteins... But how, what is the role of causality in machine learning? And actually, it is a big role because causality is... Well, it was shown in several research papers that causality is a key to generalisation in neural networks. When we are talking about the problems where you have a data drift or a data shift, so you were learning in one environment and then you suddenly have another environment.

0:20:15.7 AS: You can have catastrophic mistakes of your neural network. Suddenly it will be predicting what were you expecting. And this is because your network is exploring correlations between the pictures and the outcome. If you control... That if, instead of that, the network was not learning on just correlated features, but would learn on the causal features, then no matter in which environment you would be doing this, your network would perform well. This is the ultimate link in terms of causality and it's... Now it's a very popular research and there are several groups which are focusing on that because it's just super important in terms if you want to make this new step in machine learning and bring generalisation to it.

[pause]

0:21:22.2 RS: As you say, causality is sort of this trendy sub-field of machine learning you're seeing in lots of areas. I'm really fascinated to know that you were working on it.

0:21:29.9 AS: Oh that is actually... Super important. And actually super old.

0:21:30.2 RS: Several years ago in regards to the measured impacts of proteins on them there.

0:21:34.5 AS: Causal inference goes back to many, many years, many decades.

0:21:35.2 RS: What are some of the areas you think that causality can be impactful?

0:21:40.0 AS: It's of course, it has a big... Everywhere.

0:21:40.1 RS: You mentioned it's role in possibility and generalising artificial intelligence. What are some of the implications you think of really nailing causality?

0:21:43.6 AS: So like, from biology and also epidemiology, also social sciences and they know how to do causality also pretty well because the field is pretty old. It's... Usually they use... Yeah, exactly. So there is like a lot of paradigms, there are different views on causality, you also have time series, you also have Granger causality. There are many also philosophical schools, because at some point, causality is also a bit of philosophical concept. It comes from philosophy...

0:22:13.2 RS: It's as old as time being linear, right?

0:22:14.7 AS: In a way what is causal and what is not and then you bring mathematics and try to bring a formalism to it. But yeah, it has a lot of applications and just quite often people think also of causality of causal inference and not necessarily distinguish about causal inference from causal structure learning. So the two ways of causality; one way, is like you have a causal graph, you know causal connections, you know smoking causing cancer, you want to know how much. So it's estimating of the causal effect, or it's also potential outcomes or counterfactuals. What would happen if that would be a scenario like this? So you observe one scenario, you would try to predict what would happen in another scenario. This is one type of... One branch of causality I would say it like this. And another one is when you have no idea what you're... Like in the case we had in biology, you have no idea how the proteins are connected or you have some idea, but not very precise, and you're trying to understand what is the connection.

0:23:15.8 AS: You were observing people smoke and observing people having cancer, can you say what is causing what? Maybe cancer is causing smoking. That's a causal structure learning. So this is another type of causality and I think in machine learning you kind of need both. It's yet another branch slightly, it's how to link the invariances in particular settings in machine learning to learn the causal features, so it's a bit more close to causal structure learning. But I think the other ones also will come close as soon as we are more close to know the graph or exploiting causality. But there is a lot of really interesting research going on in this area.

0:23:51.7 RS: Of course, before letting her go, I had to know what Anna is most excited about in her field and where she thinks AI and machine learning can be impactful within the world of biology.

0:24:02.5 AS: I feel like all the areas are super important. I feel that it's the matter of the question you have. So when you work in a more applied settings, when I do for me, I'm trying to... Identify the question correctly and then try to see which methods I would apply. But if I would think of a general research, my particular preferences are in... To the problems of generalisation. I find... And also all these unsupervised and self-supervised learning, because I find it fascinating how you can basically have minimum assumptions on the data of minimum human effort try to do something using machine learning and that's also I find it's very exciting to use machine learning to discover something from your data set, and have an insight. It's a... Just a very cool feeling when you're trying to chase an answer like, oh, which protein could be so important for cancer or that why it's happening? Or how can we apply this that this device will give us really nice pictures and that could be used later on in the treatment of humans. So I found it's very cool. And actually I find it super cool that you are able to do this with machine learning.

[music]

0:25:19.2 RS: How AI happens is brought to you by Sama. Sama provides accurate data for ambitious AI. Specialising in image, video and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, e-commerce, media, MedTech, robotics and agriculture. More information, head to sama.com.

[music]