How AI Happens

Data Scientist & Developer Advocate Kristen Kehrer

Episode Summary

In this episode, Comet Data Scientist & Developer Advocate Kristen Kehrer tells us how she marries data science and content marketing in her role at Comet and what she has learned about computer vision from her guests on The Cool Data Projects Show. We also take a deep dive into object detection and the value of learning through building, and Kristen shares her advice for getting involved in an online community near you.

Episode Notes

Kristen is also the founder of Data Moves Me, a company that offers courses, live training, and career development. She  hosts The Cool Data Projects Show, where she interviews AI, machine learning (ML), and deep learning (DL) experts about their projects. Points From This Episode:

Tweetables:

“I’m finding people who are working on really cool things and focusing on the methodology and approach. I want to know: how did you collect your data? What algorithm are you using? What algorithms did you consider? What were the challenges that you faced?” — @DataMovesHer [0:05:55]

“A lot of times, it comes back to [the fact that] more data is always better!” — @DataMovesHer [0:15:40]

“I like [to do computer vision] projects that allow me to solve a problem that is actually going on in my life. When I do one, suddenly, it becomes a lot easier to see other ways that I can make other parts of my life easier.” — @DataMovesHer [0:18:59]

“The best thing you can do is to get involved in the community. It doesn’t matter whether that community is on Reddit, Slack, or LinkedIn.” — @DataMovesHer [0:23:32]

Links Mentioned in Today’s Episode:

Data Moves Me

Comet

The Cool Data Projects Show

Mothers of Data Science

Kristen Kehrer on LinkedIn

Kristen Kehrer on Twitter

Kristen Kehrer on Instagram

Kristen Kehrer on YouTube

Kristen Kehrer on TikTok

Kaggle

Roboflow

Kangas Library

How AI Happens

Sama

Episode Transcription

Kristen Kehrer  0:00  

I was sitting there drawing bounding boxes around my neighbor's Tesla. And I'm like, I feel like such. Like if my neighbor knew that I was just sitting here drawing frame after frame, drawing boxes around her car.

 

Rob Stevenson  0:16  

Welcome to how AI happens. A podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson. And we're about to learn how AI happens. Here with me today on how AI happens is a data scientist and a Developer Advocate over at common Kristen Kehrer. Kristen, welcome to the show. How the heck are you today?

 

Kristen Kehrer  0:53  

Yeah, it's going awesome. Thank you so much for having me.

 

Rob Stevenson  0:56  

So pleased you're here there is a fluffy warthog head on the wall behind you. That is new since we last spoke is there a story behind this creature

 

Kristen Kehrer  1:05  

actually comes from Germany and I have a moose above a sort of doorway. And yeah, I don't know. I saw them in my husband's aunt's house and just felt like I had to have them. They were just so big and ridiculous. And so we brought them home from Germany.

 

Rob Stevenson  1:23  

That is a funny thing to stick in your check luggage. Yeah, right. Have you named them?

 

Kristen Kehrer  1:28  

The one over the door in the house is Frank but this guy doesn't have a name?

 

Rob Stevenson  1:33  

Just anonymous Warthog and identified it as a Warthog. Right? I might my zoological backgrounds not feeling me here. Got it? Well, they're welcome additions. This one the unnamed Warthog is anyway, I'm glad he's joining us here. And I just there's so much to speak with you about Kristen, because you have this interesting role where you have this background, this technical background as a data scientist in your current role as developer advocate, you're having lots of really fun conversations and just sort of sharing what's exciting out there in the world of data science. So we can kind of go in a couple of different directions. But first, I would just love to learn a little bit more about your background, and how you kind of wound up in your current role of comment.

 

Speaker 1  2:10  

Yeah, sure. So I'm Kristen Kehrer, I started out with a bachelor's degree in math, and then later on, went for a master's degree in statistics, and right after that. So you know, in 2010, I started my career in data. And I've been there ever since. And so I started out doing time series, econometric time series analysis and forecasting in the utility industry. And then I spent some time in healthcare, doing things like analyses, for motivating people to get their colorectal cancer screening. And then I moved to spending about five years in E commerce, advanced analytics, in marketing, and absolutely loved it, I really fell in love with the marketing piece and learning about consumer behavior and optimizing different parts of the website. And it lends really well to where I moved to next, which was, I sort of built up a following on LinkedIn, and got obsessed with content marketing, and learning all that I could about that. And that allowed me to go off and work on my own. And I actually replaced my corporate salary in my first year off on my own and only came back to being an employee for developer advocate role, which sounded really new and exciting to me. And so since April of this year, I've been a developer advocate for comment.

 

Rob Stevenson  3:39  

So what exactly does that mean Developer Advocate? I'm curious how comments are looks at your background, and then what you were doing with content marketing, I was like, Yes, this is this is an important full time role for this. What do you think they hope to accomplish?

 

Kristen Kehrer  3:52  

Yeah, so developer advocacy is all about building community. And so I was able to demonstrate that I have experience building community because I've been doing that on social media through content marketing, but a lot of what I do is just being available to people to answer questions about what comment does, how comment can help people but also to aid in brand awareness and having that reach and being able to get the word out about comment in general?

 

Rob Stevenson  4:25  

Yeah, it sounds like it's almost somewhere between content marketing, influencer marketing and like sales engineering, because you have this really technical understanding of comments offering in a way that maybe a traditional salesperson may not.

 

Kristen Kehrer  4:37  

Yeah, yeah. And I'm also a lot of what I'm doing is building cool projects and sharing that with the world. And that's why comment was the perfect place for me because they track machine learning, training runs and anything that I was going to build machine learning wise, I was going to be able to show other people how I'm comparing those experiments and it left everything really open for me. But also, you know, a big part of what I'm doing is building cool projects and sharing them with the community. And so yeah, it really does marry data science and content marketing. And that's just so right up my alley.

 

Rob Stevenson  5:18  

Yeah, one of the outcomes of this marriage is the cool data project show that you host it's, I think, mostly, like, on LinkedIn live and Facebook Live is that that's where it kind of lives. And you're interviewing AI, ml and DL experts to sort of talk about their projects, which sounds awfully familiar. We have a similar mission. You and I, Kristen, but I'm curious when you're speaking with these folks, where are they kind of coming on to speak about and what's exciting about what they tell you?

 

Kristen Kehrer  5:43  

Yeah, so actually, I mean, in the beginning, it was a lot of asking my friends who I have met through being on LinkedIn that I knew were working on cool projects, it was like computer vision to detect fire. And, you know, my other friend who built a cool computer vision assistant that also integrates NLP. And so really, what I'm doing is I'm going around, and I'm finding people who are working on really cool things and focusing on the methodology and approach, I really want to know, how did you collect your data? What algorithm are you using? What algorithms Did you consider? And what were the challenges that you faced trying to get this project up and running? Was there anything that you really didn't expect that caught you off guard? And so it's been an awesome opportunity for me to learn, especially since I was brand new to computer vision just eight months ago, a lot of the people that I've invited on, it's been self serving, it's like, Oh, I know this person is going to I'm going to be able to ask a question. And actually, that's something that I need for this project that I'm working on. And so yeah, it's been an awesome time.

 

Rob Stevenson  6:52  

So with that example, do you have a computer vision project specifically you're working on?

 

Kristen Kehrer  6:57  

Yeah. So right now, I'm working on comparing YOLO V eight and Yolo V five on edge devices. So I'm doing object detection, I'm not quite there yet. I'm like, I just set up my nano the other day, but I plan on putting those models on the devices. And then I trained it on a dataset of dog images from Kaggle. And then used Robo flow to get the images in the right data format. And so now I'm going into this project, probably more than I needed to. But the end result is we're going to be doing object detection of my dog on these Raspberry Pi's, a couple of Raspberry Pi's and the Jetson nano

 

Rob Stevenson  7:42  

is the Kaggle dataset, just sort of for the sake of getting a project even running, or would that be a kind of reliable bias free way to source data,

 

Kristen Kehrer  7:51  

I wouldn't call anything bias free. But certainly when I did my school bus project, I built a computer vision model to detect the school bus going by my house. And on that one, I trained on my own data set, I actually took video of the school bus passing my house, and then I had to take those frames and actually annotate them by hand. And that is certainly more work than just going on Kaggle. So on Kaggle, not all of the image datasets have the annotations, I don't think but this one already had the annotation, somebody had already gone through and draw the bounding boxes and said, This is where the dog is. And all of the different computer vision models seem to have different file formats that they want the images to be in or the annotations to be in. But I've found that Robo flow for me is the easiest way to go about saying just throwing it whatever formats I have and then downloading it in the format that I need.

 

Rob Stevenson  8:57  

What is river flow? Robo flow

 

Kristen Kehrer  8:59  

is a full GUI that you can go from end to end for a computer vision project. And it's just so intuitive, right? Because like I've used other tools for trying to annotate images, like I've used label IMG, but in Robo flow, you're able to sort of manage your data files. So you know which you have all these sorts of different folders and I'm able to not get lost on Hey, which dataset do I need, and it's just so intuitive. You're able to just click a couple buttons. Really quickly draw some bounding boxes, and then just say, Hey, give this to me in like the PI torch format, and boom, you've got it, but it does go end to end too. So you can also train models on there as well.

 

Rob Stevenson  9:49  

Gotcha. I love that you captured all of your own data on the school bus that you sort of made the cake from scratch a little bit, right like it's such an easy reflection. not easy but when data scientists need a data set there's like all these sources on the Internet you can go to their services, the annotate them for you. Like the ones sponsoring this podcast wink wink, nudge, nudge plug plug, but you did it yourself, which I think is cool to kind of go back and okay, I'm gonna capture the content, right the data myself, I'm gonna annotate it myself, because it was a school bus. Yeah. Was there ever a moment where you're like, and you're like, well, crazy lady videotaping a school bus on your street? Like, look, I'm not a creep. I'm a data scientist. I swear.

 

Kristen Kehrer  10:27  

Oh, my God. Absolutely. I was sitting there drawing bounding boxes around my neighbor's Tesla. And I'm like, I feel like, like, if my neighbor knew that I was just sitting here drawing frame after frame, drawing boxes around her car.

 

Rob Stevenson  10:43  

Oh, gosh, how much data do you need that when you think about okay, I'm going to get my own data. The school bus like was this one video, several videos and different times of day or different weather scenarios? How did you know you had enough to get started?

 

Kristen Kehrer  10:56  

Yeah, it was a couple different videos of the bus. So it was so funny, I actually had to put this project down for a bit during the summer when the bus wasn't coming and having my phone and alarm so that I knew that, you know, I was like, Oh, the bus is going to be coming in a couple minutes. And I'm gonna go get ready. So I can videotape the bus passing my house. But it was super useful, because actually, if I downloaded data set of school buses, which exists, I could go get school buses. But the thing is, is my background in outside is never going to change and the school bus that's passing my house is always going to look the same, because school buses can look a little different, they sometimes look the same. But if I was to take a data set from the internet, I'm now going to be putting in different orientations and colors and things that I'm actually not going to see that my computer vision model doesn't actually need to be trained on. And so I started with a couple 100 images, and it wasn't enough, I kept getting false positives, the neighbors Tesla would drive by and I'd get a false positive. But I ended up going back and training the model on about 1300 images of the bus passing my house, which really only took about three hours to annotate.

 

Rob Stevenson  12:17  

Gotcha, that contextualization feels like an important problem, like when you say I even go out there and get data on a school bus. But it's data of a school that's going through an intersection or going through a country road or something. Is that a lasting problem? Do you think with computer vision is that the data that it's trained on is not going to be the same as where it's the model expected to perform, for example, in front of your house versus at A intersection?

 

Kristen Kehrer  12:45  

Yeah, yeah, no. And I think that they've seen that over time, and a lot of places right when they were trying to tell a husky from a wolf, and then they find out that it was only really detecting the Husky because all the Husky pictures had snow. And so it's something that you do have to be aware of. And it's funny, when I go to speak about this project, I'll be like, hey, is this information too Junior, because I made a lot of mistakes, when I was creating my dataset, and what I always hear back is if somebody's aware, like it doesn't hurt to hear it again. And most people have said, Hey, like, include that context?

 

Rob Stevenson  13:23  

Yeah, in that example of detecting snow rather than a detail about the Husky? How would one surface that that's what the technology was honing in on? Do you have to have an attention mechanism built in to be like, how are you making this decision? Is that what we mean when we talk about Explainable AI?

 

Kristen Kehrer  13:39  

So I'm not sure but I think that it's when the model goes, you know, when you actually start using this model, and you find that you're not actually able to distinguish correctly between huskies and foxes. And you go back and you look at your data set, and there are different programs now to that make it really easy for you to sort of go through the confusion matrix of okay, like, where are my false positives, let me look at just those photos, which comm is great for that. And then there's also a library canvas, which is an open source library that was actually developed by comet, but allow you to sort your images by the confidence of the prediction and then you look at it and you say, like, Okay, what's going wrong here? Why does this have low confidence? Or why does this think that it is the wrong object? And you just look at the photo and you're like, hey, maybe it's this or maybe it's that and you know, at some point you get to like, hey, there's snow and all these photos, you know, and we're not seeing the Huskies we have over here are running down fields of grass, or that's from probably manually inspecting and really giving a think about what's in your data and how robust your training set is.

 

Rob Stevenson  14:58  

It feels like it'd be used Will for there to be almost like a checklist, for example, like, okay, when you are looking at the data that's going to be training your own algorithm, you run down a list and say, Okay, what could be mistaken in this? Like, what are we missing here? What are the was the context that's not being considered? It's almost like empathy for data. Is there like some kind of approach you would recommend, like when you are looking at data like this to try and get ahead of some of these biases?

 

Kristen Kehrer  15:24  

Yeah. So I don't know, because there's some that are always going to take you by surprise, too, like, I moved. So the bus detector runs in my house 24/7. And so I have gotten a number of false positives. And sometimes they're things that are very weird, like, we took the soccer net out to play soccer on the front yard. And there's something about the diamonds on the netting part, that tripped up the model, right? A lot of times it comes back to more data is always better, really thinking about the augmentations that you're using? And are they relevant to the problem that you're using, and then thinking about all the different ways and backgrounds and scenarios that you're going to see. So right now we have snow, it gets dark. And actually, that hasn't tripped my model up much. But it's certainly something that I'd be thinking about in general is what are the different scenarios that we're going to see here? And then not trying to introduce stuff that we're not going to see, right? Because that's just adding extra complexity that we don't need?

 

Rob Stevenson  16:30  

Yeah. What are the augmentations you're using?

 

Kristen Kehrer  16:33  

So actually in this and that's how I found Robo flow, and I sound like a spokesperson for Robo flow. I am I am, they should pay you. Not lino I had posted on Reddit, I was like, hey, what's the easiest way to do image augmentation because I was not looking to bang my head off the wall, I just wanted something as simple as possible. And somebody who works for Robo flow was like, Hey, you should try it out. So and it was it was super easy to try different augmentations. So you know, we tried different, blurring saturation, just playing with different things. And it ended up being that it was someone who came on my cool data project show that was like, Hey, you're never gonna see those augmentations that you're making, you're never going to see in your life, the bus always passes my house on the same street, facing the same two directions. I'm only ever seeing the side of the bus. I trained it on a lot of images of like partial pieces of the bus. And I see a little bit of the front face of the bus, when it's going by and a little bit of the back of the bus when it's coming back in the other direction. But for the most part, I'm looking at the side of the bus, it's never going to be distorted. It's never going to be upside down. And so yeah, I played with augmentation. But then and now there is like, no augmented images in my model.

 

Rob Stevenson  18:01  

Okay, interesting. Yeah. So it was like a good learning experience. But okay, this is part of object detection, I should learn it. However, in my example, in this case, probably not as important. The wipers on the bus go swish, swish, swish. But the model doesn't need to know that because it's, yeah. Was this computer vision and project for you just to kind of satisfy your own curiosity? Or was there something specific you wanted to add to your own data scientists, tool belt winds come with the project, I guess, is my question.

 

Kristen Kehrer  18:29  

Yeah. So I started this just a little bit before I started working at comment, because I knew I was going to have to come up with a project so that I could demo comet itself. And the way that I like to come up with projects is to think about ways that I could make my life better. So after I did, the school bus project, I also worked on a project that I call the pill popper, 3000, looking at an that actually, we bought a 3d printer for that one, my husband joined me in working on that he ended up going out and the building this whole UI for it, that's not my area of expertise at all. Yeah, but I like to really be able to do projects that allow me to solve a problem that's actually going on in my life. And when I do one, all of a sudden, it becomes a lot more easy to see other ways that I can make other parts of my life easier, or other things that I can do for fun. And so now the majority of the projects that I've been playing with are computer vision, but I'm a tabular data person, like I am not a computer vision expert. I've just been learning over the last nine months and really through building like I've been learning through building I haven't taken any courses or anything like that.

 

Rob Stevenson  19:51  

Yeah, because I'm glad you pointed that out because you make it sound so easy. You can't have been like your background as a data scientist. test was not a one to one match for the work you're doing here with object detection. Is that fair to say?

 

Kristen Kehrer  20:05  

Yeah, that's very fair to say,

 

Rob Stevenson  20:07  

let's map the ways that was useful. How did your background lend itself to beginning a project like this?

 

Kristen Kehrer  20:12  

Yeah, 100%, the steps are still the same. You get data, you clean that data, you use that data to train a model, you use that model to make inference. There's metrics that you you know, you look at precision and recall, no matter whether the data is tabular, or it's image data. And so there are many overlapping pieces. And I just had to figure out, like, Okay, well, when I work with image data, what does that look like? And oh, okay, I need annotations. What does that look like? There's certainly a ton that I have to learn, but I have a lot of the being able to perform. Machine learning in general feels like the building blocks to being able to go and tackle a problem.

 

Rob Stevenson  20:59  

I see. And then what was new for you? What were the areas where you're like, Okay, this was not covered in any of my previous data science experience.

 

Kristen Kehrer  21:07  

Yeah, so I feel like the whole thing was more intimidating, then it actually really was I ended up writing an article I went through all the Charis computer vision tutorials, there's 68 of them, I went into each one and wrote a blog article, I made some graphs about what were the loss functions that these different tutorials were using. Because in my head, for some reason, I thought that there was going to be like, some special magic there, and then come to find out if it's like a regression problem. And computer vision, there's still using mean squared error, it was all the regular friends that I would have expected to see in tabular data problem. And so I think in my head, I made things more scary than they actually were. And then today, there's just so many blog articles, there's so many resources where you can get information in a really intuitive, easy to digest way. Now, if you're playing with something that's super cutting edge, you might have to like read the actual paper or something. But for most algorithms, and most things that you're going to try and approach and do, there's just really friendly resources available.

 

Rob Stevenson  22:30  

It depends too on how one learns, because it sounds like your case, you're like, Okay, I'm going to get my own data. I'm going to take the video of the bus, I'm going to get my Raspberry Pi, I'm going to stick YOLO V eight on it, et cetera, et cetera, et you sounds like you learn by doing is that the case for you historically? Or is that just specific to this project?

 

Kristen Kehrer  22:48  

Yeah, I actually really do learn by doing and I feel like unless I have my hands in each piece, I sometimes have more trouble connecting the dots of what's going on. Like, I don't want to just read a book, I have to actually touch it and see it work.

 

Rob Stevenson  23:06  

I would recommend people do that themselves. Because there's so much content out there. Like you really could just spend your whole day reading blog posts and books and watching videos and taking a Udemy course or whatever, at some point, you need to bring it into the real world a little bit. For sure. Well, Kristin, here we are creeping up on optimal podcast length. And before I let you go, I'm going to put it on you to kind of delicately thread this needle here at the end of the episode. What advice would you give to people out there forging their own careers trying to decide where to go next? And how to make sure that they remain curious and stay passionate about the work they do?

 

Kristen Kehrer  23:40  

Yeah, so I've really always say that probably the best thing that you can do is to get involved in the community. And it doesn't matter whether that community is on Reddit, on slack on LinkedIn, I live on LinkedIn, I'm there every single day, all the time. But it also doesn't matter if you're on Twitter, right. But there are people who you meet, they're working on similar things, you're able to chat about it. And it also helps as accountability to help keep you going. Because I know for myself, if I'm working in a silo that sometimes I can let projects slip. But if I've, you know, told a bunch of friends that I'm doing this thing, well, now I feel like I really have to do that thing. And you just learn passively from there have been a number of ways that we don't use stepwise regression anymore when we're picking features for a model. And I only learned that through looking on social media I'd probably still be if I took on a consulting project, like there's totally a chance that I would still be using that method. Had I not learned that by just passively watching Social Media. It's also the new libraries that come out. You don't want to be jumping around trying every shiny new toy that comes out but there's a lot of library theories that come out that I learned about. They're now the mainstays of what I use that I found by somebody else talking about them online. And so you want to find a way to like, just check in and get connected. And like I said, for me, that's LinkedIn. But there's great communities on Reddit and Twitter as well.

 

Rob Stevenson  25:21  

That is great advice. And for all the folks out there in podcast land, I would just encourage you to check Kristin out over on LinkedIn. She is posting lots of amazing things about her own projects, as well as having these conversations with AI ml and DL experts on the cool data project show. Kristen, this has been an absolute delight. Thank you so much for being on the show with me today.

 

Kristen Kehrer  25:40  

Yeah, thank you so much for having me. I had a great time chatting.

 

Rob Stevenson  25:45  

How AI happens is brought to you by sama. Sama provides accurate data for ambitious AI specializing in image video and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, ecommerce, media, med tech, robotics and agriculture. More information, head to sama.com