Data Science w/ Elena Grewal (Part 1)
Published 12/18/2017
In today's episode, I talk with Elena Grewal, head of data science at Airbnb. We cover a wide variety of topics, so make sure you catch the second episode in this interview as well!
Today's episode is brought to you by Linode.
Linode provides superfast SSD based Linux servers in the cloud starting at $5 a month. Linode is offering Developer Tea listeners $20 worth of credit if you use the code DEVELOPERTEA2017 at checkout. Head over to spec.fm/linode to learn more about what Linode has to offer to Developer Tea listeners!
Transcript (Generated by OpenAI Whisper)
As a developer, it's easy to believe that we shouldn't listen to our gut, but today's guest actually knows quite a bit about data science, and she says you should listen to your gut. We're going to talk about that, and a lot more with Ellen Agree-Wall on today's episode of Developer Tea. My name is Jonathan Cutrell. This show exists to help driven developers uncover their career purpose. Thank you so much for listening. I hope you enjoy the interview with Ellen Agree-Wall. Ellen, I welcome to this show. Hi, Jonathan. Thanks so much for having me. I'm excited to be here. I'm excited you're here as well, and for full transparency for our listeners, this is the third time around that we've pressed the record button. We've been having a few technical difficulties, but we're here now. We're rolling, and I'm so thankful that Ellen has taken the time to sit and talk with me about more than just what she does at Airbnb. Beyond that, into the human implications of this work, and the career that Ellen is leading, how we all as developers, we can learn from her and share in her experiences. I'm looking forward to this conversation immensely. Thank you again for your time, Ellen. Of course. You are the head of data science at Airbnb. I'd love to know, can you kind of explain what your personal mission is at Airbnb? What are you looking to accomplish in your career at Airbnb? Wonderful. The reason that I came to Airbnb was that I was extremely excited to look at data and to use data to make a positive impact on the company and on the world as a result of that. I love Airbnb's mission of creating belonging. So really passionate about thinking through how do we use data both to understand what's happening, to predict what will happen next, to think about the impact of what we do, and to think about how we can infuse data into our products and our processes to make us more efficient and to make us more effective. So it's really about figuring out what's truly happening and how can we maximize the impact of the information we're getting to help the company to meet its mission. That's a great answer. I love this idea of being able to connect to something that I love to ask guests about their purpose because you being able to connect to the mission of Airbnb and derive your own part of that mission is so critical to your success and your drive, your motivation, your day-to-day work really ends up kind of contributing to that. So I think that's such an important part of what we do. Every day is developers, is data scientists, is product managers, whatever title you take on in a digital creation space, being able to tie what you're doing back to some underlying kind of drive is so critical to not only success, but I think fulfillment as well. Would you agree with that? Yeah, definitely. I mean, I think that's what makes life fun and interesting, and I'm very thankful to be at a place where I can do that. Yeah, for sure. So you said the word data, and I know that this is such an interesting discussion. Maybe you have some insight to provide for me. There's all of these characterizations of data. There's a lot of kind of misinformation both culturally but also even within the industry of what exactly is data, right? And it sounds simple, while it's information that we gather. There's a database, there's types of information, there's columns, and then things get fuzzy. And knowing exactly what it means to have data is kind of a difficult thing to explain. And you kind of explain to me what your perspective is on the meaning of data. That's an interesting question. I mean, certainly you can think of data in a very broad way, and that's probably why it becomes a little bit confusing to narrow down. I sometimes say, people talk about, oh, I don't want to make gut decisions, right? I want to make a data-informed decision. And my response is, well, maybe there's some data that your gut's picking up, right? That we're having captured. And that might actually have some data in there. That it's not quantitative, it's not in our tables. But there's something that you've got there that you're channeling when you're making that statement. I think for data science in general, the data tends to be something that you can quantify in some way, right? And it can be surveys, it could be clicks, it could be information that people express by their behaviors and by what they say they feel. And so I do think it actually can be pretty broad. And it's really more about what are you trying to do, and that helps to inform what you might say is the data at hand. So if you're trying to make a good decision about whether to launch a particular feature or not, then you might look at data points that are quantitative looking at how people have used similar features in the past. You might look at data on whether people are saying that they want that feature or not. And those are all data points that you could bring to the table. So I guess maybe that hasn't answered your question as neatly as you would like. No, and that's a great answer. I mean, everybody's going to have a different answer for this because data is such a broad, I mean, if you really sit down and think about it, it kind of becomes a metaphysical discussion, right? Because it's not really one thing or another. It's how you use it. It's what you're doing with it that determines what it is. And so I like a characterization that I read a while back and I actually think I saw on your Twitter timeline when I was preparing for the episode. Remember what I should have said. No, I think it was someone else's characterization that you retweeted anyway. But it was the idea of the voice of the users at scale, right? The idea that individuals are not abstracted away from data, they're contributing to data. The data is not this separate entity that big companies are generating out of the ether. It's actually coming from the activities and the intents and all of the ways that we observe the users, whether for complete pictures or incomplete pictures, even still all of that data is coming from that. So our good portion of it is coming from that. So it's a very interesting perspective, I think. Right, I'm so glad that you read my Twitter feed. You totally nailed it. And it's funny because actually that is what we say here at Airbnb as well. I think that is actually a tweet from someone at Airbnb about how we think about data. Yeah, I think I remember it being from someone at Airbnb. And that's such a true and often forgotten statement. Again, because the cultural reception of what big data means feels very cold, it feels calculated, it feels kind of like the bots are writing our movies. And nothing is original anymore. And all of these kind of dystopian perspectives on what it means to have a lot of data available. But in reality, what we're really doing is we're uncovering more of ourselves and perhaps that's uncomfortable. Mm-hmm. Yeah, and sometimes it's interesting to think about what might be revealed when we do that. So it can certainly be uncomfortable. Yeah, for sure. So I do want to kind of take a step back though. And we've gotten into this discussion on data. But first, to be interested in this at all, maybe considered an anomaly, can you explain the moment where you feel like you went from uninterested to suddenly, this is the thing that you want to do in your career? Sure. Yeah, it's a great question. I don't know that I can point to a particular moment in time, but I can certainly point to around when it happened. And I would say it was actually after I started at Airbnb. I had come to Airbnb from a PhD program at Stanford and knew people in Silicon Valley and tech and really started to look at data science jobs just because I had some friends who knew about it and had pulled me out, it could be a good fit. I met the head of data science at Airbnb at the time. And at first I thought, oh, I'll just do an internship, but I just loved everyone I met and thought they were so smart and fun. And so I ended up doing a full-time job. But the whole time I was kind of like, I don't know if this is going to be right for me. I really like academia. I like my research. I don't know if I'm going to be able to cut it. I really wasn't sure if this working in data science was something that was for me. So it really took a chance in some ways. No one for my program had ever done such a career transition. And sorry about that. That's OK. Anyway, so started at Airbnb and was kind of like, oh, I guess I'll see how this goes. If it doesn't work out, I told my advisor that I would still apply for academic jobs if it didn't work out and he was happy about that. And anyways, so started. And I think probably about six months in, I realized that I just really liked this. And that I wasn't going to be going back to academia. And it was realizing how fun it was to work with people, to come up with problems and to use data to actually solve those and to tie the work that I was doing to impact in the real world. That was something that I think sometimes in academia, you're like, OK, well, maybe someone will use my research at some point. But that's not the direct incentive. And so I really like that that's the incentive in data science when you're working at a company. Yeah, that's a great perspective. And to clarify, you have your PhD, is that correct? That is correct. So, a person that comes from this background of relatively deep academia and ending up in a role that is very much so a practical application of perhaps some of the things you were learning in academia. Did you study machine learning when you went to Stanford, correct? I did. Yeah. So, you know, I would say that throughout my life, I've been more interested in the problem to solve than any particular technique. And so actually, you know, my PhD was in education and I was interested in understanding social networks in schools. So, you know, I saw that there were schools where, you know, 50% of the students would be white, 50% might be minority and it looked integrated. But then if you looked at the friendships, they were often highly segregated. And so I was curious about, you know, what was leading to that friendship segregation, obviously, impact of that segregation and in particular seeing trends around income inequality and thinking about, you know, what is segregation like in terms of social class, right? Like, are people from different socioeconomic backgrounds, friends with one another in a school setting? How does that play out? How does that influence their later life outcomes? And that was the question that I was interested in. And to answer that question, I needed to do social network modeling. And so then I said, okay, well, to do those models, there's no package that exists and it's not easy to do right now. And so actually I need to learn how to program so that I can build a model to capture the network effects that I wanted to capture. So I learned voting for that. And then, you know, I was like, okay, well, to do this modeling, I need to know what is statistics. So I'm going to learn that so that I can do the modeling and answer the question that I want to answer. And so being this like amazing skill set for data science just by accident because I had picked this question that I wanted to answer and to answer it, I needed these skills. And so that was really the preparation that ended up being so helpful for working in industry. And the other piece that was really helpful was that, you know, my advisor was doing research on segregation in schools. And so I was a research assistant for him. And he was extremely detail oriented and was meticulous in his analysis. And so I got great training about how to do really careful data work, research assistant. And that was extremely practical. And again, you know, very helpful when I actually am working with data at Airbnb. Yeah, that's such a good preparation. And you probably got a lot of really good questions as that research assistant position, you know. I imagine that there were questions that were asked that had very specific, you know, kind of implications in the code you were writing where you'd have to go back and say, oh, well, he wants to filter by, you know, category X. And so I have to figure out how to do that. Or, you know, there needs to be a date range applied here. We need to normalize some of this data, remove anomalies or whatever thing that you would question. It was great. I mean, he would have very particular things that he would want to look for. And then he would review my code too. And so he'd be like, you know, you didn't do this correctly. You should use this instead or, you know, cut it this way or add this filter. And that was really helpful. We'll get right back to the interview with Elena in just a moment. But first I want to thank today's sponsor, Linode. Linode has helped develop her to continue putting out these three episodes a week. We've always had this podcast offered for free to everyone who listens to it. And so I appreciate Linode's help in sharing these messages, allowing me to interview people like Elena. Because ultimately, if this podcast ends up helping even one developer, then Linode is part of the reason that that happened. If you've found value in this podcast, then one of the people you should send a thank you card to would be Linode. They are allowing this thing to keep on going. So thank you to Linode for sponsoring today's episode. And Linode is offering you an excellent deal. You've probably heard about Linode's service offerings at this point. You can get started with just $5 a month on Linode. You can get a gigabyte of RAM for $5 a month. It's one of the cheapest things you can imagine buying for a business scenario. And don't be fooled. A gigabyte of RAM is actually enough to run a website on it. It's not, you know, it's not like you have to upgrade. To be able to have anything useful. Like most beginning tiers on other services, many of them are kind of handicapped. They're not enough to actually run anything production on. A gigabyte of RAM is enough to run, for example, a content site on, especially if you're just running static content. Which many developers, that's all you really need. You're going to create a static site and you want to show off your portfolio. This is an excellent way to do it. $5 a month will get you there. But on top of that, Linode is providing you $20 worth of credit just for using the code Developer Tea2017. And of course, we're nearing the end of the year. So make sure that you listen to another episode if you're in 2018 to hear whatever that new code is going to be. And Linode is going to continue supporting Developer Teainto 2018 as well. And if you don't like what you're seeing, if you don't like the service, try it for seven days and you get a money bag guarantee. So check it out, spec.fm slash Linode use the code Developer Tea2017 to check out. And you're going to get $20 worth of credit. That's four months on that one gigabyte plan. You don't have to use it on that. You can use it on any of Linode's services, any of their hourly services. Once again, the spec.fm slash Linode. Thank you again to Linode for sponsoring today's episode of Developer Tea. Wow. Wow, what a cool, you know, back road entrance into this kind of illustrious end point of a career. It's very interesting. I myself, I got a master's degree. I didn't get all the way through my PhD. It's kind of a stretch goal for my life at some point, potentially. But what are you trying to do? Yeah. Yeah, no, it's kind of a long term, like back in the back of my mind kind of thing. But it's kind of a seductive thing to stay in academia because you have this kind of wide open field. And there's very little that is depending on your research. It's most of the time is self directed as long as you're the academic program is approving the research that you're doing that you can kind of pursue the things that are interesting to you. And I think that developers would benefit from some of that attitude being brought into problem solving and their day to day work. Yeah, I mean, you know, that's definitely true. And that's a lot of the benefits of a PhD program. You know, I would say that one of the main benefits is also that ability to be self directed, which is something that is really what will help you to be successful in your career is to kind of say, like, you know, I'm on my own to figure this out and I can do it. And I can learn these skills and I can do something that people haven't done before, which is really what you're supposed to do when you write your dissertation. And I think that's something that's really cool. In terms of like other skills, though, you know, I do think like this is actually something that comes up a lot for aspiring data scientists where people will say, Hey, do I need to get a PhD to be a data scientist? And my answer is always no. I mean, there are many people with PhDs who are data scientists. And I think that's correlation and not causation because the people who select into a PhD program are often curious or often interested in learning something new and technically oriented. So that sets them up well for when they join a data science team, but it's not that you needed to get that PhD. Because the reality is that like, you know, being on your own and being self directed is good, but only to a certain extent, right? Like you really want to be able to tie what you're doing back to a business problem at some point or, you know, have a vision for how it will be useful and not stay purely in the theoretical forever. Yeah, definitely, definitely identify with that for sure. Because again, you know, going back to your experience, it wasn't that you were trained as a, you know, machine learning practitioner. That wasn't, that wasn't the outcome of your PhD. You just happened to need those skills to accomplish the thing you were trying to accomplish. And this is another perspective that, you know, I'm a big proponent of this idea of mission driven development, another way of thinking about is purpose driven development or project driven development. The idea that your task is not, you know, you're not in a bubble. And very often developers fail because of this perspective, but the idea that your task is, you know, to itself that it doesn't contribute to something. And more importantly, for this conversation that it doesn't contribute to an outcome, a specific outcome. Having the ability to understand, you know, where am I in the chain of things, the chain of responsibility in this project, what is this piece of code or what is this module or what is this, you know, algorithm, how is it contributing to what we're trying to do together as a team or as a company or even just as an individual? What is this thing actually going to contribute to the whole? Thank you so much for listening to today's episode of Developer Tea. Make sure you check out this second part of this interview with Alana. We actually had to do it on a different day. We had some technical difficulties towards the end of this interview that we cut because obviously, be a little bit confusing for you to hear that, but we did continue the interview. We finished it up and we had some really good discussion in the second part as well. Make sure you listen to that. If you don't want to miss out on that and future episodes of Developer Tea, then subscribe and whatever podcasting app you are using. There's tons of really good apps out there for listening to podcasts. And as it turns out, we're going to be on Spotify as well. So look for us on Spotify in the upcoming weeks. Another huge thank you to Linode for sponsoring today's episode and many of this year's episodes of Developer Tea. And over to spec.fm slash Linode to learn what you can get out of Linode service offerings today. Thank you so much for listening and until next time, enjoy your tea.