Developer Tea :: More Misunderstood Truths About Statistics

More Misunderstood Truths About Statistics

Published 4/2/2018

We've established that statistics are useful and more relevant in our day-to-day work life, but how do statistics effect our personal selves? In Today's episode we're talking about what statistics mean for our personal selves and how we make our decisions.

Today's episode is sponsored by Linode.

In 2018, Linode is joining forces with Developer Tea listeners by offering you $20 of credit - that's 4 months of FREE service on the 1GB tier - for free! Head over to https://spec.fm/linode and use the code DEVELOPERTEA2018 at checkout.

Transcript (Generated by OpenAI Whisper)

So we've established that statistics are useful and perhaps more commonly referenced in our normal conversations than we may realize at the outset. But how can we say that statistics matter when we're looking at our own experiences? How is it that statistics apply to me? And more importantly, what can I do to understand my own statistics? It seems that I need a much larger group of people to measure rather than just measuring my own self. That's what we're talking about in today's episode. My name is Jonathan Cottrell and you're listening to Developer Tea. My goal on this show is to help you as a driven developer uncover your career purpose and to do better work that has a positive influence on the people around you. So we've talked about statistics in the last episode and it's something that I'm really kind of passionate about. I'm passionate that people understand what statistics mean to them. What is the place of statistics in your life and in your decision making? Decision making is a statistical process. When you break it down, if you have one option versus another option, even if your option is a non-statistical, for example, will I work out or not, you are still weighing two options when you make a decision. And ultimately, the way that humans tend to make decisions relies on that option kind of weighing more heavily in the positive direction than the other ones. This is what we talked about in the last episode, the Bayesian decision theory. This is a decision theory that says that we trade off the costs and the benefits, the cost-benefit analysis, and these are all statistic terms. But I do want to kind of go back to something we said in the last episode, and it's something that I hear often about statistics and most often in individual cases. Why do statistics apply to me? We hear the phrase that you don't want to become a statistic. This is a kind of a negative connotation that you don't want to be, you know, in front of that group of people that something bad happened to. And so you should, for example, wear your seatbelt or drink eight cups of water a day. But on the flip side, we also have kind of a reticence to view ourselves in terms of our quantified data. Now, this isn't true across the board. Certainly, the quantified self-movement is a big kind of vote in the direction of understanding yourself in terms of data. But I want to talk about. Why this is important first, and then secondly, areas where individual statistics actually do make a difference. They are measured and they are used on a regular basis. So we're actually going to do those in the opposite order. We're going to start by talking about a few kind of types of information that are measurable at an individual level that we can apply to our to our decision making process to inform us into the future. So. Very simple examples certainly come from sports. For example, batting average in a given season. A player may have between 450 and 650 at bats and certainly during practice significantly more than that, of course. And so what what the measurement of at bat percentage the batting average that's measuring how often does that person the batter the player? How often do they get what's called? A safe hit? In other words, a hit that doesn't turn into a foul, for example. So this is expressed in terms of a ratio. And the ratio is, you know, if it's let's say, for example, that you have five out of ten. Well, your batting average would be point five. This batting average is totally unreasonable to expect because that would mean that every other ball that you get thrown, you hit safely. And the highest batting average. On. Record in Major League Baseball is Ty Cobb, unless somebody has unseated them from my quick Google searching and Ty Cobb had a point three six six. So what does this tell you and why is it useful to two batters, two two baseball players? Well, it tells you obviously the the direct information that it tells you is how often does this person hit a ball? But it may also tell you in a given game. Is this person doing? Better than they normally do? Or are they doing worse than they normally do? And if you look at a few games in a row, if the batting average of the course of those few games is lower than their overall batting average, right? We can start to draw out new information. And this is all about a single person's performance. Another very simple example of this is how many steps you take per day. For me, the number is almost always the same. On a given day of the week. So Mondays often look the same as other Mondays and Sundays often look the same as other Sundays. And as it turns out, most of us have this particular information available to us, especially if we've enabled certain types of apps or if we have something like a Fitbit that's that's tracking this information. But what is this information telling us? Well, it tells us about our patterns, for example. I'm much less. I'm. I'm not active on the weekends as a general rule because I tend to stay home and spend time with my family rather than during the week. I'm out. I'm at work. I'm doing walking meetings. You know, I might walk from one location to another in the middle of the day, and that typically doesn't happen as often on weekends. So this information is reflected in the quantified data. If I look at the information, what's interesting is you can also look back at times when I've been out. I can look back at times when I was traveling, perhaps walking through airports or, you know, maybe I was out touring an area of a new city that I haven't been to. And you can see that my steps are significantly higher on those days, sometimes 13,000, 14,000 steps in a single day. But as a general rule, we can look at our activity for Mondays. For me, it just so happens that the highest correlation is based on those days. The highest correlation is based on those weekday alignments. For Mondays or maybe for you, it's over the course of time you can look at that average trend, and most people will have similar numbers on a daily basis. So what is this describing and why is it important? Well, again, we're taking this idea of statistics and applying it at a singular person level. There are many variables that every person faces every day. There are many variables that every person faces every day. There are many variables that every person faces every day. And it would be easy to use the variability of my day to make a claim that my step count is unreliable as a way of understanding my activity. But the data shows differently. If I go and look at the data, then I know that my average behavior stays about the same. Now, this may produce different results for different people. There may actually be a high level of variability. At least for some people, for this particular metric, there may be a reliable statistical analysis that may give you insight into your own behavior and may may change it. of other kind of quantifiable pieces of information that you can track throughout the day about yourself. Whether it's about your health, like for example, tracking your calories, you'll probably find that you eat the same number of calories unless you are actively changing that number on purpose on a given day. Other numbers might include the amount of time that you sit in traffic per day. You may feel like traffic is heavier at the end of the week or at the beginning of the week, and in some cases it may be heavier at the end of the week or at the beginning of the week, but you're likely to find that on average for the same trip, your amount of time traveled is going to be very close, and your perception can change this. Your perception can make traffic feasible. Much longer. Now this can be a problem, and that's what we're going to talk about after we talk about today's sponsor, Linode. With Linode, you can get up and running in just a few minutes by picking your Linux distribution, your node location, and the resources you need for your application, and then pressing go. You launch your Linode, and you have access with a ton of tools and support that Linode provides on day one. Linode has 24, 7, and 8. You can get up and running in just a few minutes by picking your Linux distribution, 7, and 8. You can get up and running in just a few minutes by picking your Linux distribution, They have developers on staff. They can even do your DevOps for you. They have professional development operations services that you can take advantage of so that you can focus on your business rather than focusing on the technical implementation details that really don't add a bunch of value to your business. As a developer, that may mean simply focusing on the code or on a user experience or the things that you actually care to focus on. Rather than getting your site back up after the server crashes. One of my most favorite things about Linode is that they provide their services all at an hourly rate. What that means is you only pay for what you use. Go and check it out at spec.fm slash Linode. If you use the code developer T 2018, all one word at checkout, they'll give you $20 worth of credit towards any of their services. That's not just the Linode itself, but also they're supporting services like for example, Node balancer, go and check it out spec.fm slash Linode. Thank you again to Linode for sponsoring today's episode of Developer T. So, why is it important that we look at these statistics about ourselves when possible and when do we know when we can discard them? Because it is true, the intuition is correct that we cannot always trust statistics. Sometimes, we absolutely need a larger collection of information, a larger amount of information for anything meaningful to come out of it, right? So if we only have, you know, one or five or maybe even 20 data points about our life, about a particular, you know, trend that we're trying to measure against, it's very unlikely that we have enough to create a trend. What is going on here? Why is it that we can kind of not even look at that data because there's just not enough there? The reality is for any kind of data point, for any kind of information, depending on the possible outcomes and depending on the complexity of the scenario and the number of variables that you are controlling for, there needs to be a minimum amount, minimum amount of data available. So I'll give you an example. If you flipped a coin 20 times and every time you got heads, this is a very unusual scenario, very unusual to get heads 20 times in a row. And perhaps at that point you might be questioning maybe this coin has heads on both sides or maybe it's a trick coin, right? So this is an unusual outcome and it's significant, right? It is significant because of the lack of complexity, the lack of variables that go into this, that typically with 20 flips, maybe you would see five in a row or maybe you would even see 10 in a row, but 20 in a row, you know, and we're kind of using an arbitrary number here. So the number is not as important as the intuition. On the other hand, let's say you are a business owner and you have a physical storefront and 20 people have walked by, and you are in a relatively busy area. Should you, they have, they've walked by and they haven't walked into your store, 20 people. Would it be reasonable to deduce from this information, which on its face says that a hundred percent of the people walking by did not come in. Should I then extrapolate out into the future and say, no one will ever come into the store or even on the average case, most people, almost all people will not come. Into the store. Now, whether or not it is statistically reasonable is not really the question here because the statistics would tell you to predict a 100% rate of no one coming into the store. However, you would probably be shutting your doors a little bit too early because making that judgment call really relies on a significantly higher amount of traffic, especially if your store is a, niche store, for example, right? Perhaps your rate of acquisition of people should really only be measured once a thousand people have walked by the store. And if at that point, no one has walked in, then perhaps you consider changing your storefront. So this is important intuition again about statistics because it really is kind of the, the common reason to remove statistics thinking altogether. This is because we don't have enough information. It's not reliable enough. There's not a large enough sample size, and therefore we're not going to engage it at all. And this is a reasonable response to unreliable data, right? It's a reasonable response when you have a small amount of data. And generally speaking, when you're a small company, like most startups, most agencies, most of the companies that are people who are listening to this podcast, you work at companies that are small enough that statistical thinking is probably not kind of at your core, right? And so, you're not going to be able to do that. And so, you're not going to be able to do that. This is definitely true for the company that I work at. Now, does this mean that you can throw away statistics altogether and never come back to it? The answer to that question is absolutely not. So, what is it about statistics thinking that is important? Well, as we mentioned earlier with our traffic example, we may have an estimation of traffic that is very different from reality. Now, what is it that causes this? Well, I'm going to give you a little bit of an example. Our perception versus a measured reality is very often different. We see this happening with estimation errors all the time. We have the perception that is very difficult to shake, by the way, even for seasoned programmers, even for people who consistently have issues with estimation. It's very difficult to shake this perception issue of estimating in a way that doesn't reflect the reality. So, we have a perception that is very difficult to shake, because we're not able to shake it. So, what is it that causes us to overestimate a measured reality? In other words, we overestimate what we're able to do, and we underestimate any kind of variable that may be outside of our control that will increase the timeline well beyond our estimate. Now, in our minds, this is not a reasonable thing to expect, because we see things differently very often because of our perception. We have a warped perception, warped view of reality. And this is, again, this is very difficult and it's very common. In fact, everyone has distortions of reality in their perceptions. However, when we can measure things, we can have a common ground that we look at together. It's much more difficult to warp a common ground, a measured common ground. Now, just because we measure it, does that mean that adequately represents reality at a large scale? For example, let's say that you have that storefront again and you're measuring people coming in and you're getting a 5% rate of people coming into your store. Well, does that mean that you should just blanket expect for 5% of all people to come into your store? Very unlikely that your measured statistics are going to represent a holistic view of reality. So, if you're measuring people coming in and you're measuring people coming into your store, you're measuring people coming into your store. So, if you're measuring people coming into your store, you're measuring people coming into your store. So, if you're measuring people that has no other variables. For example, there may have been a convention in town full of people who very much so appreciate your store or very much so don't appreciate your store. And this could skew your measurements, right? So, we have to take everything that we measure with a grain of salt and remember that the number of variables that may be affecting something is almost immeasurable, almost infinite. It's very difficult to measure. You know, the distant effect, even things like butterfly effect come to mind that have an impact on whatever it is that you're looking at. But what we don't want to do is avoid measuring altogether or adding to that pot of variables ourselves by relying on our perception, relying on intuition. These are things that we know are not very good at measuring. We know that our perception, our intuition, they're not good at measuring and coming to a common ground that we can share with other people, that we can use to predict the future, that we can use to describe our present situation. Thank you again for listening to today's episode. I hope you've enjoyed this. These last two episodes on statistics, I know this is a hot button issue, especially, you know, between developers and other types of roles in companies. So, I encourage you to do everything you can to measure and come to a common ground. I encourage you to do everything you can to make this practical and reliable and not create a false loyalty to statistics, you know, just because you understand them or you believe in them. Instead, create a loyalty to another person. Remember that all of this should be viewed through the lens of connecting with other humans and helping make your efforts together better. Thank you again to Linode for sponsoring today's episode. Speaking of making your efforts better, Linode can help you do just that. Head over to spec.fm slash Linode and use the code developer T 2018 at checkout for $20 worth of credit that can go towards any of their plans and services. Thank you again to Linode. Thank you so much for listening to today's episode of Developer T. We very likely will continue some discussions on statistics, depending on how everyone is responding to these last two episodes. There's a lot more that we can do. We can unpack about how we can use statistics to make more reliable decisions. And in what ways do we skew these with our perception? And we kind of glossed over that in today's episode. But that is certainly an important part of this discussion. Thank you so much for listening to today's episode. And until next time, enjoy your tea.