Developer Tea :: How Averages Can Trick You and Obscure the Truth

How Averages Can Trick You and Obscure the Truth

Published 9/27/2021

Averages can trick you into thinking a generalized idea about a complex set of data. This kind of compression happens not only with averages but any other process that summarizes information.

Ask yourself: What am I missing in this story?

🙏 Today's Episode is Brought To you by: Retool

Build internal tools, remarkably fast. Connect to anything w/ an API. Drag-and-drop React components. Write JS anywhere.

Check them out over at https://retool.com/startups/devtea

📮 Ask a Question

If you enjoyed this episode and would like me to discuss a question that you have on the show, drop it over at: developertea.com.

📮 Join the Discord

If you want to be a part of a supportive community of engineers (non-engineers welcome!) working to improve their lives and careers, join us on the Developer Tea Discord community by visiting https://developertea.com/discord today!

🧡 Leave a Review

If you're enjoying the show and want to support the content head over to iTunes and leave a review! It helps other developers discover the show and keep us focused on what matters to you.

Transcript (Generated by OpenAI Whisper)

We use averages all the time. As engineers, as people, we use averages to describe groups of other figures, groups of other numbers. In today's episode, we're going to talk about how averaging can trick you. They can make you believe that something is true that isn't quite true. My name is Jonathan Cutrell, you're listening to Developer Tea. My goal on this show is to help driven developers like you find clarity, perspective, and purpose in their careers. And I have a driving underlying reason that we're talking about averages today, but before we get to that, I want to talk about kind of the mechanism here. How exactly are averages sometimes not getting the whole picture? I'll give you the kind of simple example to understand. And then we'll talk a little bit about the math. And then we're going to talk about how this applies to our roles as engineers, as managers, as product developers, as thinkers, and as human beings. Most of us are first introduced to averages in grade school, specifically with our grades. We talk to the teacher about what makes up our grades. And in a very simple grading system where there's no waiting of one grade versus another grade, your grades averaged together. So one exam you might get a 100, and then the next exam you might get an 80, but you walk away with a 90 average. And of course, teachers also have introduced waiting systems so that your pop quiz that you weren't prepared for on a random Friday isn't weighted the same as your, let's say, final exam. And so you might have what amounts to a final exam that counts five times as much. You can imagine multiplying that final exam five times and you're back to working with averages. So the basic idea, if you're unfamiliar somehow with averaging is that you add all the numbers and then divide by the number of numbers. If all the numbers are the same, then the average is whatever that number is. And let's say you add 70 plus 80 plus 90 plus 100, that's 340 divided by four gives you 85. Before we go on a rant about how this can be tricking you into believing things that aren't necessarily true, let's first say that averages are incredibly valuable. In fact, the vast majority of machine learning algorithms are working off of various usages of averages. In particular, when you're looking at things like loss functions, you are probably using some kind of average. For example, the mean squared error. That is an example of an average that we use that's very incredibly valuable. But averages often don't tell the whole story. For example, 70 plus 80 plus 90 plus 100 gives us an average of 85, but two 70s and two 100s gives us the same average. Now this is not really that surprising when you have the boundaries of zero to 100. You can kind of intuitively guess that with those boundaries in mind, there's only so many combinations and the grading system probably works out to be relatively fair. But even within this system, sometimes the results are a little bit strange or they seem to be strange if you try to put a story to them. For example, I want you to imagine the kind of student that the following grades likely came from. A 95, a 95, a zero, and then another 95. Similarly, I want you to imagine the kind of student that the following grades came from as well. A 75, an 80, a 70, and a 60. I'm sure you see this coming, but these sets of grades both average to the same grade. According to the final grade that you would get with these scores, assuming that those grades are not weighted, these students are identical. This is problematic because as you begin to put a story to this discussion, as you begin to put a story to these two different pictures, you can understand some of the failings possibly of an average system or using averages as a definition of a full set of numbers. For example, in the first student situation, very clearly the pattern is established of success. They have 395s in that set. Very often, zeros are handed out for students who forget to do an assignment. It's possible that this particular student, let's say, had a really difficult week. They had something happen to them personally that made them forget their work and they got a zero as a result. The other student, however, by traditional measures at least, is a less successful student. They have a consistent pattern of significantly lower grades than the first student. Yet, when you average them out, they both come out to the same number. What this system is employing is that consistency or showing up is more important than excellence. There are a lot of things that you can infer when you look at these individual cases. You could infer that the system is heavily biased against doing nothing, in other words, the forgetfulness or skipping class. That intuition will be backed up by the basic fact that the top end of failure in most grading systems is a 60 and the bottom end of failure is a zero. By not showing up, this person has lost a 60 point margin. But the point of this is not to criticize, average grading. There are plenty of other options for grading. Many teachers grade on a curve, for example. Many teachers have that weighted system we were talking about. They offer options to redo something that you missed or maybe you forgot. It's not necessarily to criticize the grading system, but instead to highlight the fact that averages are a shortcut for describing a large set of numbers in a very reductive way. Sometimes, that reduction is actually a compression of meaningful and important information that shouldn't necessarily be compressed. I'll give another specific example of this, and you can watch out for this and spot it in a lot of different cases in press releases. In this particular case, it was in a health news, basically a summary of a study that was done. The headline, which was obviously crafted for clicks, essentially said that you lose something like 24 minutes off of your life if you eat a hotdog. The basic message that you might take from this is that for every hotdog you eat, you very directly are losing some amount of time off your life and that you're willing to trade 24 minutes for a hotdog. That's the typical response. But the way that this study was actually conducted is quite different. This turns out to be an average. Imagine that you have 100 people and all things being equal, those 100 people have a life expectancy of let's say 77 years. Now let's say that you were tasked with finding out how much does it impact a person to eat hotdogs for their entire life. And so you run a study, somehow you get perfect data. This is all completely infeasible to do, but you get perfect data that shows that 10 of those 100 people died about 20 years before everyone else. In other words, they died at the age of 57. There are many different ways to report this information. For example, you could say one that of 10 people who eat hotdogs for their whole lives will die 20 years earlier. You could replace that one out of 10 with 10%. You could also say that 90% of people who eat hotdogs for their whole lives will live a full and happy life all the way to the age of 77. What you could also do is take the number of years that is lost, in this case, about 200 life years and average it across the whole set of people. 200 years divided by 100 people is two years. So on average, if you were to take this whole group of people who all eat hotdogs for their whole lives, they are living on average two years less than people who don't eat hot dogs. Now, this is all made up. There's no real scientific information in this discussion. It's all statistics. But the important factor here is that no one in this particular made up scenario actually only lived two years less. There's two basic cases. One case is to live all the way to the age of 77 and the other case was to die young at the age of 57. The averaging changes the way you think about the information. If you're used to reading these kinds of studies, then you know how to read into this information. By the way, if you wanted to convert that to how much does a hotdog take off of your life, then you can just divide the number of hotdogs. This is another type of averaging just in a different way. The average per hotdog, you're dividing rather than multiplying. But the underlying message here is that as you compress information, as you change information and averaging does this, you begin to lose details of what that information was about. This shouldn't be a surprise to us as developers. But when you see things like averages or another example of this is burn rate, there is more to the story. When you see these metrics, you have to ask, what else is there? What is that compressing? What am I missing when you take this and turn it into a single dimension? What were those previous multiple dimensions? That story is here that isn't getting conveyed with this compressed version. We're going to take a quick sponsor break and then we're going to come back and talk a little bit about how this applies to your life as a software engineer. Developer Tea is supported by retool. After working with thousands of startups, retool noticed that technical founders spend a ton of time building internal tools, which means less time on their core product. So they built retool for startups, a program that gives early stage founders free access to a lot of the software that you need for great internal tooling. The goal is to make it 10 times faster to build the admin panels, the crud apps and dashboards that most early stage teams need. You bundle together a year of free access to retool with over $160,000 in discounts to save you money while building with software commonly connected to internal tools like AWS, MongoDB, Brex segment. You can use your retool credits to build tools that join product and billing data into a single customer view or convert manual workflows into fully featured apps for your team. And you can even build tools that help non-technical teammates read and write to your database, and there's so much more. Retool will give you a head start with a pre-built UI, integrations and other advanced features that make building from scratch much faster. To learn more, check out the site, apply, join webinars and much more at retool.com slash startups. That's R-E-T-O-O-L.com slash startups. And that's all that makes it a great job to build a new tool for their support of Developer Tea. Imagine your team is trying to estimate the effort necessary to complete a particular project. Of course, this is a very hard thing to do. We've talked a ton about estimation on the show, and I don't want to get too into the weeds and the methodology of estimation, but instead I want to talk about how we can fool ourselves into thinking about averages as a way of modifying our behavior. So let's imagine that you estimated that something was going to take you, let's say, three weeks, and it ends up taking six weeks. So now you've gone over by 100%, however you want to look at it, you're two times longer than you expected to be. Now let's imagine that on the next round you modify that, you all take a lesson from that experience, and instead of underestimating you overestimate. It takes you half as long as you expected it to take. And let's imagine for a moment that you expected it to take you six weeks and this time it takes you three weeks. So on average, you have been correct in both scenarios. This is the kind of counterintuitive part about averaging together. Because as you look at the metrics, if you were to compress both of those together and look at your error, your error in this case was a vast underestimation and then a vast overestimation, you would imagine that a team that does this over and under and over and under, you know, vacillating between these forever is on average correct. But the truth is that they're never correct. Now you could say that this is semantics, but there are some ramifications to this, all right? So let's talk about this a little bit more in depth. The truth is that you're never correct. If you were to look at the error, the error in both of these cases is three weeks. Three weeks over or three weeks under both in both scenarios, the error is cumulative. In other words, your total error here is actually six weeks. Now even though you got to the end of it and if you were to estimate it all together, it would have taken you nine weeks, if we're all tracking our time right here, then we had one project that took six weeks, expected it to take three, we expected the next one to take six and it actually took three. That's a total of nine. Now what has happened here is that number one, you haven't gotten better at estimating, right? The estimation process has not been refined. And additionally, in each of these iterations, as you are estimating with error, there is some loss. Now, this could be loss at a administrative level that gets covered up. It could be loss of morale, maybe it's stress inducing or maybe in the case that you underestimated, there's extra slack. And perhaps more than what the company can handle. In other words, they allocated time for you to do something. That's three weeks versus the six weeks or they allocated the six weeks and you ended up being done for three, for three of those six weeks with nothing to do. And this can be a huge cause for the company as well. And it's not just company costs, right? As we've already outlined in this case, these errors can cause frustration and stress. And ultimately, it's harder to plan with this kind of variability. Now, again, my goal is not to push you into revising your estimation strategies. You might end up doing that. You might not. My goal instead is to encourage you to think more about what an average is telling you. What is the composition of the underlying numbers? And there are kind of methodical ways to ask this question. For example, you might ask about the distribution of those underlying numbers. You might ask about outliers. Are there, you know, is there any, and there are statistical kind of measures to describe all of this stuff, there's statistical measures to describe outliers? You may just want to actually look at those numbers and see, you know, does this actually track, even intuitively, with what I understand from the average. It's tempting to use an average as a blanket descriptor. This is especially tempting for managers who are running reports. For example, it's especially tempting for hiring managers who are looking at salaries. And very often these things are so unique in their individual cases that averages become much less meaningful. So all of this is to say that not only averages, averages are certainly a good example, but in every way that information gets compressed, for example, summarizations is another kind of semantic way that information gets compressed. In every way that information gets compressed, you should at least be thinking what is missing. Sometimes what's missing is okay. Sometimes that missing piece only serves to distract or it's not meaningful differences. All of the numbers are very close and we're just looking for something that describes how close they are. But when you detect that there's some kind of compression taking place, you should be asking, what exactly is happening? What am I missing? What is not here? Thanks so much for listening to today's episode of Developer Tea. Thank you again to Retool for sponsoring today's episode. You can get over $160,000 in discounts to save you money while building with software commonly connected to internal tools like AWS, MongoDB, Brex and segment, head over to retool.com slash startups. And of course, if you want to let them know that you're coming from Developer Tea, go to retool.com slash startups slash devt, that link is in the show notes as well. Thanks so much for listening. If you want to continue talking about things like what kinds of information, averaging compresses or if you have more thoughts on this particular subject or other similar subjects, any episode from the back catalog, if you have any kind of comment, please join the discord community. Head over to developertea.com slash discord. Joining and participating in that discord is always going to be free. We're never going to charge for any of that. So please come and join at developertea.com slash discord. Thanks so much for listening. Next time, enjoy your tea.