Stat Series: What Statistical Measure Are You Overusing? (And What to Do About It), Part One
Published 3/1/2024
On average, you're probably overusing this specific type of statistic. In today's episode, we discuss the king of all misleading numbers: averages!
There's so much to talk about with averages that we're splitting this into two parts. Disclaimer: I am not a mathematician. But we will talk about some of the interesting properties of averages and why they are so addictive to use for humans, but more practically what counterintuitive ways we might be using them incorrectly.
If you're using your sprint velocity to forecast work, this episode is for you!
🙏 Today's Episode is Brought To you by: Jam.dev
If you’re an engineer and you would rather spend your time writing code than responding to comments in your issue tracker, send your team Jam.dev. Go to jam.dev to get started, it’s free.
📮 Ask a Question
If you enjoyed this episode and would like me to discuss a question that you have on the show, drop it over at: developertea.com.
📮 Join the Discord
If you want to be a part of a supportive community of engineers (non-engineers welcome!) working to improve their lives and careers, join us on the Developer Tea Discord community by visiting https://developertea.com/discord today!
🧡 Leave a Review
If you're enjoying the show and want to support the content head over to iTunes and leave a review! It helps other developers discover the show and keep us focused on what matters to you.
Transcript (Generated by OpenAI Whisper)
What is the most overused statistical measure of all time? My name is Jonathan Cottrell. You're listening to Developer Tea. My goal on the show is to help driven developers like you find clarity, perspective, and purpose in their careers. The most overused statistic of all time. This is so important, I'm actually going to do two parts on this particular subject. The most overused statistic, we won't bury the lead here, is the average. If for whatever reason this is imprecise in terms of definition, it is the mean. What this is, is a summing of all constituent measures divided by the number of constituent measures. And this is an incredibly useful statistic, don't get me wrong, but it very often misleads us. And I want to talk about a couple of counterintuitive things that you may not be thinking about with relation to averages. Now here's what the mental trick of averages does. The mental trick of averages, why it works so well, and why it is so overused in the first place, is that it provides some kind of heuristic value. It provides some kind of heuristic value. That summarizes a much more complicated set of other values. We imagine that the average is comprehensively representative of whatever the underlying data actually is. Now we are going to talk just a little bit about what these statistical kind of representations or definitions are. We're not going to get into the math very deeply here, but it does make sense to lay out the assumptions so that you know. Kind of the patterns that you're looking for, for when the average may actually be misleading. So we've established that the average is to our brains, essentially a heuristic that we try to use to describe the underlying information. And this is reasonably useful as most heuristics are, especially if the data set that you're talking about is essentially normally distributed. This is what you would typically think of as the bell curve distribution. And the implication of a normal distribution is that there is some predictable pattern of outliers and distance from the average. The easy way to think about this is the further away from the average, the less likely it is that you'll see that particular value. And in fact, that is kind of the definition of the normal distribution, although that likelihood is not a linear fall off. In other words, a few. A few steps away may be very close to the same kind of likelihood to the average. If you want to do some deeper study, you can look into normal distributions. This is essentially described by the average and the standard deviation. You can create a normal curve using those two measures. So why is the average useful in this situation? Well, if you were to take a random sample of a single value, right, take a single value from that. And you're going to get a distribution of values that is normally distributed, then you're going to more likely land close to the average than to the outliers. So hopefully you can kind of see where this is going. If the distribution is not normal, then it's very easy for the average to be absolutely not representative of the whole. And other metrics would tell the story better. To prove the point, imagine a population of six people. Who's average? Average age is 19. You're very likely to imagine a normally distributed group of people who are around the age of 19. But it's also true that a 90-year-old and five one-year-olds together have an average age of 19 as well. So not only is our brain not really wired to understand averages intuitively, because we're using them as a heuristic representation of the whole, but it's also important to note that the average may have absolutely no impact on the average. So it's absolutely no direct relation to the underlying data set itself. Interestingly, this also goes for the normal distribution. It's possible to have an average of a normal distribution that doesn't even show up as a value in the normal distribution itself in the first place. This becomes especially important if the absolute values inside of the distribution actually matter. And to kind of prove how pliable that number is, how pliable the average is, think about this simple fact. You can represent any discrete average number between the numbers 10 and 20 by only a population that consists of 10 and 20. And of course, the same is true for any two numbers. Now, while a lot of this may be interesting, and perhaps some of it has been counterintuitive, I want to share with you in the next part of the episode a counterintuitive usage of this. Of this statistic, of the average statistic that may be affecting you every single day in your job. We'll talk about that right after we talk about today's sponsor. If you're listening to this show, there's a good chance that you're a developer and that you work on a team that builds a front end product. And there's also a good chance that if you were to look in your backlog right now, you probably have a couple of bug reports. How good are those? What was the last time that you got a bug report that you felt like was really comprehensive? It actually covered everything you needed it to. More often than not, you probably only get just a text description or maybe not even a screenshot of what's going on. No console logs, certainly. No user ID. You have to go chasing for the information on what the reproduction steps actually mean. For example, did they actually click that particular button or did they tab to it and press enter? Or did they close it? Or did they close the tab? Instead of fixing it, you have to go bother the person who made the ticket and get on an hour long call. You go back and forth for weeks. You may even realize that it wasn't even a bug in the first place. They just didn't clear their cache the right way. And truthfully, it's no fault of their own. The people who are reporting these bugs to you, they just don't understand all of the context that you might need. But it's really frustrating on your side. It's probably a pretty hard sell to go and train those folks. How to do a better bug report. But instead of that, you can just use today's sponsor, jam.dev. Jam.dev provides developer-friendly bug reports in a single click. You may have heard of it because more than 75,000 people are currently using it. It's a free tool that saves software engineers a ton of time and a ton of frustration. It forces the process of bug reporting so that it literally can't go wrong. It automatically includes a video of the bug. It includes console.log. It includes network requests, all the information you need to debug, to reproduce. In fact, it's so detailed that you can actually take the spec from the bug and put it directly into an automated test to cover that bug in the future. So you're going to get better at regression testing as a result of this as well. If you're an engineer and you would rather spend your time writing code than responding to comments in your issue tracker, head over to jam.dev to get started. It's free. Thanks to Jam for sponsoring. Today's episode of Developer Team. Imagine you just got an urgent Slack message, maybe from your boss, maybe from a product owner or a customer. If you're working in the client industry, it's probably a client that sent you an email and they're asking, when is that thing that you promised would be done today? When is it going to actually be done? And there's four. There are four common ways that this question gets answered. The first is one that we're going to throw out, the I don't know answer. While this may be technically true, nobody can actually know for sure when something is going to be completed until it actually is. It's not incredibly helpful and I advise against this particular answer type. The second way that people tend to answer this is by using their gut, using some kind of intuition, imagining when they think it might be done. Sometimes people get this right, but it does tend to be a bit of a dice roll. There are times where using your gut can be helpful, but typically when you're estimating a workload, especially if you're estimating it on somebody else's behalf, using your own intuition is bound to lead to a bad situation. So I also recommend against this one. Now that leaves us with two quantitative methods. Hopefully you see where this one is going. Okay. The first quantitative method is to look at your average. This is what we do when we look at, for example, our velocity. We look at the past, say three or four sprints, or maybe we even look at cycle time average. And once again, because we're using this as a heuristic, a representation of the whole, we imagine that thing A looks like thing B. If we are asked, when is this going to deliver? We want to find things. We want to find things that look like that thing and try to answer with this other thing took this long. So therefore this will take this long as well. But here's the counterintuitive problem with averages. You may intuit the assertion that there's a 50% chance that you won't deliver that thing by that time. That's not necessarily exactly accurate. It would be a correct statistical description. But once again, most of the time. We care about looking at the underlying data set to get some better understanding of the representation there. Imagine that 75% of the items that hit your backlog, you can complete in, let's say four days or less. And then 30% of the items that hit your backlog take you longer than that three days. And in fact, 5% of the tickets that you receive take longer than a month. Then there are many descriptions that you could pick up. At the same time, you may find that you may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have taken the tickets for five days and may have know that 70% of the time, the tickets that you receive, whatever they are, are completed in four days. The average value compresses all of this information out. If we take our previous example, talking about ages, where there's five ones and 190, responding with an average of, say, 19 days, if that was your distribution of cycle time on all of your delivered work items, that cycle time is totally illogical when you actually look at the data set. And so answering with a 19-day cycle time would have a risk of being wrong in 16% of cases, assuming that the underlying data is indeed correct. If you rolled a six-sided dice, your chances of getting a six-sided dice would be lower. Getting a six on that dice, basically, that's how often you would be wrong by something like 420%. Now, this is an extreme example, of course. It's very unlikely that your sample size or your sample data has five ones and 190 for cycle time. So here you might say, well, actually, cycle time is fairly normally distributed. We have some things that go very quickly. We have some things that take a little longer, and we have a lot of things that are in the middle. That's kind of a... basic description of relatively normal distribution. Well, if your data is indeed normally distributed, and you use the average of your cycle time, then half of the time, you're going to deliver late. Now, we're going to kind of step away from math for a second and instead talk about psychology. This one's a very short one, the negativity bias. So assuming that your data is normative, and assuming that you're using the average to answer the question, when will this be done? The perception that people will have of you, in part due to the negativity bias, and in part due to the simple mathematical property of using an average in a normative data set, you're going to be perceived as always late. If you're feeling that right now, I want to challenge you to determine, are you using averages to answer these questions? If so, it's very possible that we just... stumbled on the reason why you're perceived on a regular basis as delivering things late. Even if you're right half of the time, the negativity bias is going to unfortunately do away with a lot of the goodwill that you may have gained by delivering on time half of the time. So what is a better way? What is a better way to talk about this? Because it is true that you will have outliers in things like cycle time. In almost everything that you do at work, there will be outliers. So how can we get better than average? How can we get better than delivering the average? One way to do this is to think about reliability and risk. Instead of asking, when do you expect this to be done on average? You may ask instead, what level of risk are you willing to take that the estimate is wrong? In other words, because no estimate can be 100% correct, because fundamentally we can't tell the future, by using representative data from our past performance, we can identify some of the outlier risk in our estimate. And the simple way to do this is, especially if you're using some kind of forecasting method like Monte Carlo, the question you are answering shifts a little bit. We've talked about substitute questions on this show before. Instead of answering the question, what is the average delivery time in order to say what you believe the delivery time will be, you can instead ask the question, based on historic data, how long would something like this work item take in 85% of cases? What this implies is that you're willing to take a 15% risk of that being wrong. And it's reasonable to look at that 15% to see what that distribution looks like. For example, when it goes wrong, how bad does it tend to go? You could do that by saying, in 95% of cases, what is that number? And then perhaps looking at the discrete cycle time of a few things that are beyond that 95%. So you have basically an idea of the 1 in 20 super far outlier cohort. So really what you're doing is you're changing the question a little bit, right? You're changing the question from what do you expect the midline average to be with perhaps significant variation on either side of that midline to instead, what can you give with confidence as a delivery timeline? Now, it's important to note that people using averages are going to have more optimistic timelines than people using this method because you're basically saying, well, it's going to take a little bit longer to capture things that are above the average, but we're trying to capture things that are above the average enough that we have a reasonable degree of confidence in our estimates. So again, the counterintuitive thing about the averages in a normative environment is that if you were to use that for estimation, you are late 50% of the time. This is a problem, especially because of negativity. But if you were to use that for estimation, you are late 50% of the time. This is a problem, bias. In the next episode of Developer Tea, we're going to continue this discussion on averages as the most overused statistical measurement of all time. Thanks so much for listening to this episode. This episode was sponsored by Jam. Head over to jam.dev to get started today. You're going to have better bug reports, which ultimately, by the way, will result in a lot less time spent on fixing bugs in the first place. Head over to jam.dev. It's free. If you enjoyed this episode, I encourage you to do two things. One, if you don't want to miss out on the next part in this series, subscribe in your favorite podcasting app. That's the best way to make sure you don't miss out on future episodes. And my second request is that you leave a review in iTunes. I'm sure you hear this all the time. It's not necessarily a unique request, but it is indeed the most impactful thing you can do to help other engineers like you find and listen to Developer Tea. Thanks so much for listening. I'll see you in the next episode. Bye. Thanks so much for listening. And until next time, enjoy your tea.