Dalia reflects on the evolving nature of data science, discussing the skills and qualities that are now more crucial than ever for excelling in the field. We also explore why creativity is essential for problem-solving, the value of starting simple, and how to stand out as a data scientist before she explains her unique root cause analysis framework.
In today’s episode, we are joined by Dalia Shanshal, Senior Data Scientist at Bell, Canada's largest communications company that offers advanced broadband wireless, Internet, TV, media, and business communications services. With over five years of experience working on hands-on projects, Dalia has a diverse background in data science and AI. We start our conversation by talking about the recent GeekFest Conference, what it is about, and key takeaways from the event. We then delve into her professional career journey and how a fascinating article inspired her to become a data scientist. During our conversation, Dalia reflects on the evolving nature of data science, discussing the skills and qualities that are now more crucial than ever for excelling in the field. We also explore why creativity is essential for problem-solving, the value of starting simple, and how to stand out as a data scientist before she explains her unique root cause analysis framework.Key Points From This Episode:
Tweetables:
“What I do is to try leverage AI and machine learning to speed up and fastrack investigative processes.” — Dalia Shanshal [0:06:52]
“Data scientists today are key in business decisions. We always need business decisions based on facts and data, so the ability to mine that data is super important.” — Dalia Shanshal [0:08:35]
“The most important skill set [of a data scientist] is to be able to [develop] creative approaches to problem-solving. That is why we are called scientists.” — Dalia Shanshal [0:11:24]
“I think it is very important for data scientists to keep up to date with the science. Whenever I am [faced] with a problem, I start by researching what is out there.” — Dalia Shanshal [0:22:18]
“One of the things that is really important to me is making sure that whatever [data scientists] are doing has an impact.” — Dalia Shanshal [0:33:50]
Links Mentioned in Today’s Episode:
Canadian Conference on Artificial Intelligence (CANAI)
‘Towards an Automated Framework of Root Cause Analysis in the Canadian Telecom Industry’
Dalia Shanshal 00:00
I sometimes see data scientists go to the most complex algorithm out there to solve a problem, just because it's cool, and it's fun to use it. But sometimes just the logistic regression would do, you just have to have the right data.
Rob Stevenson 00:16
Welcome to how AI happens. A podcast where experts explain their work at the cutting edge of artificial intelligence. You'll hear from AI researchers, data scientists, and machine learning engineers as they get technical about the most exciting developments in their field and the challenges they're facing along the way. I'm your host, Rob Stevenson. And we're about to learn how AI happens. Here with me today on how AI happens is a senior data scientist over at Bell Dalia Shanshal, Dalia. Welcome to the show. How are you today?
Dalia Shanshal 00:51
I'm good. Thank you. Thank you for having me.
Rob Stevenson 00:54
You are powering through COVID At this moment you had rescheduled last week because you were feeling like you got hit by a truck. Doesn't sound like you're feeling much better now. But you are an absolute champion and powering through. So shout out in respect to you for getting on with it.
Dalia Shanshal 01:09
Thank you so much. I was very excited about this. And then last week when I got sick, it wasn't ideal. And I thought you know what, this week, I'm just gonna go for it. And so let's get it going.
Rob Stevenson 01:19
Yeah, it's I'm really glad you decided to do that. And it's a little benefit of doing a remote podcast is that it's possible for us to do this in a safe and carefree manner.
Dalia Shanshal 01:28
Exactly.
Rob Stevenson 01:29
So something I didn't know when we first met Dahlia is that you were fresh off the heels of having been Master of Ceremonies at Geek fest, which is like Bell's internal conference last hackathon. So I just love to know more about it. How did I go?
Dalia Shanshal 01:43
Sure. So let me first tell you a little bit more about the geek fest conference. It's this was the sixth annual geek fest conference. It's an internal conference that bells network and Technology Services host each year, it was hosted it was a virtual event. The focus really, is to get all software developers together data scientists to talk about the latest innovation that within you know, the software development work cloud services, data and machine learning. So last year, I presented what they say is a lightning talk. So it's a five minute talk about a project that I was working on. But this year, I had the opportunity to emcee at the Bell CTV studios, actually in Scarborough, Toronto, and it was really fun and an exciting new experience for me.
Rob Stevenson 02:31
So what kind of panels were you moderating?
Dalia Shanshal 02:33
I was actually doing the hosting. So the opening the closing, introducing the talks, there was one panel that I moderated, which is the dev x program. I don't have some notes right now. It's been quite a long time. So I, but the dev x program is how to integrate and create this golden path for software developers
Rob Stevenson 02:55
got it? Okay. What kind of stood out to you? Were there sort of notable discussions? Or like, what kind of conversation about AI was taking place here?
Dalia Shanshal 03:03
You know, I'd say this year, the most focus was on, you know, the cloud, and the moving the data to the cloud and leveraging cloud services. And a lot of talks were on how do we leverage cloud to have greater operational efficiencies and you know, enable better application performance, by modernizing our IT infrastructure. So that was one of the really main emerging topics this year that I noticed. Other topics, of course, are on AI ml innovative application that focuses on reliability of our network. So how do we leverage machine learning and AI to make sure that we are proactively resolving all network disturbances out there. So there are many initiatives on that end. Another topic that was also interesting is the impact of the agile framework. So the movement to this agile framework, which has been in place for quite a long time right now, but there was an interesting talk about how important it is on team success and how important it is to create a culture of trust of innovation, and an engaged in collaboration. And that all feeds into the success of software and AI ml projects. So also the importance of people. Not only the technology,
Rob Stevenson 04:26
it sounds like a industry conference, like a trade show, but Bell is sufficiently large that you can kind of execute that kind of thing just internally, which I'm tickled by.
Dalia Shanshal 04:35
Oh, yes. 100% there was around if I'm not mistaken, more than 400 attendees per day, and it is a big event. And it's really exciting because you get to we were actually shooting it live. So first time on live camera, so that was definitely out of my job description. But yeah, we have the infrastructure to host such an event. And there's a lot of interest in that as well.
Rob Stevenson 05:01
So I'm sure our listeners will be familiar with bell with the trademark with a logo. This company has been around for a long time since the late 1800s, I believe, founded by Alexander Graham Bell, the inventor of the telephone, I'm sure people will recall. And you are on the Canadian subsidiary of Bell, working more on the research and development side. So I would love to know a little bit about your role and what it is you're working on over there
Dalia Shanshal 05:23
Bell constantly improves the hardware and the software anywhere in the network. Why do we do that, of course, it's to provide the best technology and best services out there to our customers. And so what the team I work with in the US is really at the heart, I'd say of the new products and services. And so when we launch a new product or a new service, we want to make sure that customers are having a similar experience or a better experience than the previous one. If at any point we see this, then the team I work with is responsible for raising the flag and saying, Hey, let's pause, we might have some bugs here. Let's review. So we would review with different teams with products with our engineers, with our vendors with fields and technicians. And so what I do is really try to leverage AI ML to speed up and fast track those investigative processes.
Rob Stevenson 06:19
Okay, got it. That's helpful. I want to dig into that definitely. But first, I'm just curious to learn more about your background and how you came to be at Bell. So you've been around the block a couple of times in the field of data science, I would love to know how you've kind of seen the space evolve a little bit data science has historically, within the last 10 years been a very sexy job, right? And I'm just curious how you've kind of seen the space evolve in that time.
Dalia Shanshal 06:44
Let me tell you first about my journey maybe we'll talk about data scientists being the sexiest job and if it's still the sexiest job. So let me tell you about my journey. I actually did not study data science in undergrad I did economics at McGill. And I focused on stats because I've always loved math. And I had no idea what data science was It was back in 2014 15. And then I was you know, just browsing online about potential actuarial courses, certificates. And I bumped into an article saying data scientists are the sexiest job of 2012. What is data science? So I decided to pursue a master's in data science. I really liked it. I worked in academia in teaching and research at TMU, Toronto Metropolitan University. And so that's really how I got into data science super randomly threw an article online, that just piqued my curiosity. And I realized that I love that, you know, if you go back to how it evolved, and based on my experience, well, first of all, why was it called the sexiest job? Well, because at the time it was, and it's still rare and brings huge value to the company. Data scientists today are key in business decisions, we always need business decisions based on facts and data. So to have the ability to mine that data is super important. What I saw the most change overall in the data science field is, when I first started to look into data science, they were very, very few formal university programs and certifications. And so today, there are plenty online in universities. So that's good, because well, it kind of helps determine what the skill sets required are for successful data scientists and have a consensus on that. Now the challenge there is it evolves super, super fast. So I remember when I first started my certificate, Hadoop was the big hype, you know, it was Hadoop, parallel computing, and all of that. And now I remember my professor was just telling me about this big query paper that Google just published, but there wasn't a lot of talk about it. Today, we're moving to the cloud. We're moving away from Hadoop and from local storage and more to the cloud. And that happened in just like five years. So it's exponentially evolving. And we we all know that. So that's a bit of a challenge there in programs and certification, especially because I do have some teaching and research experience. So I know how things evolve very quickly. Another change that I noticed in evolution is a better separation of data roles. So a data scientist in 2015, probably was, should have been a unicorn. I don't know if you've heard that the data scientist the unicorn, there was a lot of expectations from what the data scientist has to bring. And so there was expectations for data scientists to know everything from data engineering, to data analytics, machine learning, engineering, all of that and even software development. There was a lot of mix between what companies wanted from a data scientist and its nice to see today there's a better segregation of the roles. No. So it'll be clear that a data engineer won't do the job of a data scientist data scientists bring something else to the table more focused on how to analyze the data, rather than, you know, get the data and optimize the queries, it's more based on you know, which algorithms to use that are more beneficial to this type of data to this type of problem, how to solve the problem. That's, I think what the best data scientists, a good data scientist is, is to, they will definitely a good data scientist needs to have the skill sets in academic and soft skills, but the most important skill set is to be able to have creative approaches to problem solving. That's why we're called scientist. So for example, I don't know if you've heard about that one, it was actually in the article of the Business Review, data scientists is the sexiest job of 2020 12, to give an example of how a data scientist was able to optimize fraud detection, or come up with a fraud detection system by using a DNA sequencing algorithm. So that is, in my opinion, the beauty of a data scientist work is that creativity, creativity in in solving problems with data, that's what I think change. So if you feel I want to answer your question is I'd say, you know, academically, things are changing, the technology is evolving vastly, very quickly. And then the role of a data scientist is becoming more more clear what they should bring to the table and their skills. Also, on another note, there's this shift in the practical uses of data science projects. In fact, a lot of data science projects fail in the industry. And part of the reason is, because we are so much focused on the algorithm, which algorithm to use, and how to optimize the algorithm. But we often forget about the data that we're feeding into it. And so what's interesting is, I sometimes see data scientists go to the most complex algorithm out there to solve a problem, just because it's cool, and it's fun to use it. But sometimes just a logistic regression with do, you just have to have the right data. And if you have the right data, use a simple algorithm, less computational power, and you're solving your problem faster. So there, I think that that approach to data-centric AI is very important. Today, I'm a big proponent of it, the term data-centric AI has been named by Andrew Ng, and I know that he is working on creating a systematic way of approaching data, because we know how to approach algorithms, right? We know, okay, we feed the data, we train the algorithm, we test the accuracy, the performance, go, no go, we do another round, we update parameters, all of that we have a system to that. But we don't have a system of what data do I feed in. What are the requirements of the database, they need to feed into that algorithm? And if I have that, that that would make data scientists work much faster if there's a system to it more efficient to the algorithm because it is fed into the algorithm. And you know, whatever you feed into the algorithm is also very, very important, as important as the algorithms capacity to learn from the data. I hope that makes sense. I kind of rambled a little bit about everything. But I hope that makes sense.
Rob Stevenson 13:32
It totally does. Yeah. And when you say, it would be easier if there a system around data, are you referring to an organization's approach to the way they source clean? Like a strong data foundation? Basically?
Dalia Shanshal 13:44
Yeah, exactly. More focused on a practical or academically, theoretically, practically, how do I approach? Which data do I need to use? Why do I need to use this data? How do I clean it? You know, I know that there are methods here, but there is no one system. So the feature engineering, you know, it really, there's a lot of domain expertise that needs to come into the process of data scientists work. The data scientist knows the algorithm, but you know, we need some domain expertise, we need a system that helps us determine, Okay, I want to use neural networks, okay, for this kind of problem. That's the data I have, what's the process? How do I evaluate how well the data is for that algorithm? So it a little bit like that?
Rob Stevenson 14:34
Yeah. Yeah, that makes sense. Now, when you gave an example there of like, Oh, why would you bother with an algorithm when a logical regression might be a better approach? And that is not going to make you popular with venture capitalists? I know, right? But it might actually solve your problem. And it also it's similar to what you had said a minute ago about how Hadoop was all the rage in 2012. And it's like, the point is that There are these very shiny tools does you're right that it's like popular to use. But someone like you. And I think lots of people listening to the show are the folks who are just like, listen, forget the hype, I know how to roll my sleeves up and get this job done. And so, with that perspective in mind, it's less important to know Hadoop or any given single language right? Or approach, would you agree more about the problem solving part
Dalia Shanshal 15:25
100%? For me, that's what makes all the difference in the practical use of data science. That's what makes all the difference is, how do you approach the problem? How do you solve it, because everybody can learn how to code, everyone can learn all the algorithms, that's great. So I can be a doer where somebody would tell me, Hey, do this, this, this, and I'll code it. But the data scientists needs to have a creative mind to apply things. And a lot of the time, I see the really impressive applications of algorithms that are typically used in a domain that that are being used in a completely other domain. And they're working super fine. And they're bringing a lot of value to the company in organizations. And that's what's important. Yes, and I'm always a proponent of start simple. Well, it depends. But I always like when it's business problems, I always like to go simple first, and test it. Because there's a thin line between you know, because efficiency is important, if I wasn't research, sure I can, you know, take my time to develop new things, complex stuff, sure, it's fun. But when I'm in the industry, I need results quick. That's why we hired, start simple, get some proof of concepts working. And if it doesn't work, then quickly shift to another approach.
Rob Stevenson 16:46
It's so interesting, I'm hearing this more and more that the technical ability, the technical know, how is more table stakes in your field, right? It's like, okay, we expect you know, our we expect, you know, you're familiar with Hadoop, we expect you can, you know, take one of these models or algorithms off the shelf and figure out how to make it sing for you. But it's more about the other stuff. It's more about someone's creativity, problem solving skills, ability to work in a team, it's like, okay, your long laundry list of skills on your resume doesn't impress me as much as your ability to to get a job. And so I'm curious, like, it's interesting to me, that's like when you say anyone can learn to code that it's like, okay, this is a commodity. This is table stakes. What else you got? I'm curious, like, when you meet with people, you must have been surrounded with fantastically well educated, right? Academics and people who have these technical abilities your whole career? What is it that actually impresses you? Well, can someone say where you're like, Okay, despite all that technical know how I actually think you might be able to get something done outside of knowing given, you know, language of the languages? You're?
Dalia Shanshal 17:47
That's a really good question. I mean, again, I have to go back to problem solving and creativity. What impresses me the most is making those links. I actually, I have a brilliant intern right now who I would love to share what he's doing because he's, that's someone who impressed me with the way he tried to find the solution to the problem. So maybe, how can I say this? What impresses me the most innovative ideas is innovative ideas. So there's a difference between saying, Okay, I have this problem. Okay, I know that there's this algorithm from this library. I'm gonna take it, apply it, accuracy, done, performance, done, deploy it, okay. But two things are really important. Is the business really using it? Or is it just running their dashboard, and nobody's using it? No value. And number two is, that's maybe something that's also really important is, are you able to communicate how your process is going to help the business so that end users are actually going to use it, and not just have it there? So communication and creativity and problem solving these three, when someone has it, no, really, I think impresses me. And I've been surrounded by really, really brilliant data scientists, then, as you mentioned, academics, researchers, I learned a lot also from them a lot. I've also learned a lot from the domain experts, I think it is super, super important to keep the conversation going with the domain experts and the end users who are going to make decisions based off the data that you're giving them based off the, you know, the output of the algorithm that you're giving them. It's not an easy task to communicate very technical work, to, you know, business in a simple and concise way. And so that's really important. It's important for them to understand how things work and how that benefits them and to work together, whether it's from, you know, the data part, understanding the data, what features we're using, why did we do the change of the features that We use, I think they have to be very closely on board with that as well as the output. They don't necessarily have to understand the algorithm and how it works, but the output of it, and how they can leverage it and use it.
Rob Stevenson 20:13
I don't want to go into that, because let me know I love them. I'm gonna go how you just reiterated the scientist, half of the data scientist part, because I'm guilty of just glazing over that, forgetting that that's half of it, like in the same way. Oh, what's a software engineer? Like, what does it mean to engineer Right? Um, it's really doesn't mean to sit in meetings and talk about like a slideshow. But with the data scientist part like, this is a process of testing of experimenting, and crucially of research. And you are able to keep that researching hat on both inside and outside belt, which is part of why I was excited to speak with you. You recently had to do some of your work featured by AI Montreal, and I was hoping we could talk about that too.
Dalia Shanshal 20:48
Yes. So it was a paper that was accepted and published in the Canadian artificial intelligence conference. It happened in Montreal each year, it happens in different areas of Canada. But this year, it was in Montreal and McGill in my, your alma mater. Yeah, exactly. It was really, really a humbling moment to come back and see yourself going back as an industry professional, as opposed to a student. So, maybe I want to give you a little bit of a background. So my role exactly, is not specifically research. But because I just love research. And I think it's very important for data scientists to keep up to date with the science. I always when I'm tackled with a problem, I always start with researching what's out there. And so that's why I decided to, and I came up with that framework that I'm going to explain to you right now that is now published, and people can go read the paper. And, you know, if you have any comments or feedback, I'd be more than happy to hear about that.
Rob Stevenson 21:52
Careful. Careful opening those floodgates? Oh, yeah, right. I hope so. I will definitely link to the research. And yeah, you should. I'm being cheeky. But yeah, I'd love that yours inviting, you know, opening the floodgates like, yeah, tell me what you think. But yeah, for sure. I mean, that's what science says,
Dalia Shanshal 22:08
exactly. And the more feedback and comments I can get, the better we can all work together. I think that's great. I love that I love brainstorming in teams. And you know, it's an exchanging ideas. And so it's just part of my passion. Maybe let me talk to you a little bit about the work itself for the paper itself. So the paper is called towards an automated root cause analysis framework in the Canadian telecom industry, the framework that I came up with, and again, it comes back to a very simple idea, but just nobody did it out there. So remember how I was telling you, let me give an example. Let's say, Okay, tomorrow, I am launching a new software version on a modem, how do I see how well it's performing? There are many ways but one of the ways is the call rates. And so what we do is like an AV testing, where we look at the call rate of that new product that we're launching and the current product out there. And if at any point, we see a difference, then, like I told you, what my team does is, you know, we raise the flag, and we do the investigation. So it's an iterative process. And it's not humanly possible to do all of the permutations. So what I was thinking is, why can we connect, automate that. So what I was thinking about is an algorithm mainly really reused in marketing association rules algorithm. So can check here real algorithm here, FP growth algorithms are different variations of it. But the idea is, I have a grocery shop, and then I have a list of transactions of what customers bought. So I have, let's say, 1000s of transactions. And for each transaction, I have the items. So I have orange juice, milk, coffee, okay? And then second transaction, I have orange juice, milk, a pencil, etc, etc, etc. And when I feed that list of transaction items to my algorithm, what it's going to give me, it's going to give me okay, if your customers bought milk and orange, then the probability of them buying coffee is 5%. Okay, so basically, it's taking all the subsets and giving me the probability of a third item. Now, if I translate this into my use case, some transactions some of my customers will have calls, and some won't have calls. So now I put this information into the algorithm. Now I have in one shot, all my call rates right now the data is giving you leads. So that's kind of how I translated this into the use case we were trying to solve.
Rob Stevenson 25:05
So that the data is like you would expect a call rate to be X percent. And anytime it deviates from that now you have something to investigate, right? Is that the lead? That's
Dalia Shanshal 25:13
the overall call rate? Yes. If it deviates, then I need to investigate. But I need to investigate what's the contributor? What characteristics what component or what characteristics do I need to look into? So I can then say, hey, you know, engineers take a look at these cases.
Rob Stevenson 25:31
Okay. Okay. Yeah, I mean, this has implications for like site reliability for like, anytime there's uptime, basically, right. Okay, what is the interruption here? You can detect it. And then your approach, this allows you to be even more granular with your detection.
Dalia Shanshal 25:45
Exactly. Exactly. more granular with the detection. Hey, KPIs telling us something in the alarm. Okay. Can we get more like surgical? Well, yes, we can, we can either do it manually, and it will take us weeks, or we can do it through an automated algorithm, and it's going to give us the output directly, and help guide us now. It's not going to give us the golden nugget, but definitely much quicker for investigations.
Rob Stevenson 26:11
Got it. Yeah, it's brilliant. I can see why it's been useful, Abell but I again, I feel like it has implications for any sort of reliability
Dalia Shanshal 26:18
product, and really any new product as long as you have a KPI set, any new product that you want to launch and you know, it's like a complementary to an A B testing, because a B testing will tell you, yes, there is a significant impact or no, there is no significant difference. No, but there is a significant difference. And it's not a good difference. It's like the KPI is not doing well, then. Okay, how do we go about analyzing it, then that could be a complementary step to any product launches.
Rob Stevenson 26:47
Right, right. Just much more specific, like, you know, in in marketing and AP test is like, Oh, the email with the orange banner clicked better than the email with the purple banner. Yes. Right. But then, with this approach, you would be like, Oh, well, why or Exactly, exactly. Non orange? Yeah,
Dalia Shanshal 27:01
yeah. What are the characteristics of people who click on orange? And people clicked on purple? And where's my highest difference? Exactly? Yes. Yeah.
Rob Stevenson 27:09
Yeah. This is fascinating. So when you spoke to AI, Montreal, did they have follow up questions where they just like, we're gonna publish this and discuss it, or what was sort of the fallout of having it published,
Dalia Shanshal 27:18
Well, the goal of having it published is really to have engagement from the community and you know, showcase this new innovative way of approaching things. And you contribute to the field then. And also, it was a really good opportunity to network with people and get their, you know, their insights and see what everybody's doing as well in different domains,
Rob Stevenson 27:38
such as scientist answer, what was the point? The point was for knowledge is the point is this to drag ourselves one millimeter closer to automate? Nirvana? Right?
Dalia Shanshal 27:46
Yes, exactly. Exactly. That's beautiful. Exactly. Yes. Yes. It's to contribute and see what's out there. And just intellectual stimulation. It's fun.
Rob Stevenson 27:58
Yeah, yeah, absolutely. Well, diet, we are right here at optimal podcast linked here having been intellectually stimulated. And I would just like at this point, say, thank you so much for being here. I love learning about your work your general approach to data science, and, you know, reflecting on the maturity of the organization, in the spirit of knowledge and peer review, and all those scientific pursuits die before I let you go, is there anything you'd like to share with the community out there?
Dalia Shanshal 28:23
Yes, I do. I would like to share my passion about Tech for Good and using technology for good. One of the things that is really important to me is making sure that whatever we're doing has an impact, I'd like to highlight a special project that I'm working on. It's called the armed dumb project, what it is, it's a public art installation. And the idea is that art and technology merged together to inspire unity. I'm going to give you a brief summary of what the project is, we were hoping to build this dome, and there's going to be sensors is going to be done on a Raspberry Pi. And when you enter the dome, your heartbeat is going to be sets in their sensor will get your heartbeat, and the dome will start to light up. And then the more people come in, the more heartbeats there are the more light bearers and it will light up into the tree of life. And so that's an artistic venture that I'm taking on the side with, you know, ex colleague of mine at McGill University, who's a philosopher, I just wanted to maybe share a link to that if anybody's interested. We're always looking for volunteers, whether they're from the arts, industry, or technology. I think blending the two together is very inspiring. It's a beautiful cause. And so these were maybe the last words that I would like to share about making sure that whenever we're working in tech, we always have a mindset about what is the goodness of the work that we're doing, whether it's at work or whether it's outside work, and in that case I wanted to highlight on Dawn which is an artistic endeavor that aligns with technology.
Rob Stevenson 30:06
It sounds beautiful Dalia, we will make sure to include some links in the show notes so people can check it out and volunteer, help out, and contribute in whatever way they may like. Thank you so much, Dalia, for being here. I have loved chatting with you today. This has been a great episode.
Dalia Shanshal 30:19
Thank you so much. That's been great. Thank you very much, Rob.
Rob Stevenson 30:23
How AI happens is brought to you by Sama. Sama provides accurate data for ambitious AI specializing in image video and sensor data annotation and validation for machine learning algorithms in industries such as transportation, retail, e-commerce, media, med tech, robotics, and agriculture. For more information, head to sama.com