ยซ All Episodes

Stat Series: What Statistical Measure Are You Overusing? (And What to Do About It), Part Two

Published 3/6/2024

In this episode we continue our discussion about the most overused statistical measurement. We'll talk about a few more counterintuitive properties of the average, and how you might be underserving your colleagues as a result of thinking in averages.

๐Ÿ™ Today's Episode is Brought To you by: Neo4j

Is your code getting dragged down by endless JOINs and long query times? Try simplifying the complexity with graphs!

With Neo4j, you can code in your favorite programming language and against any
driver. See what graphs can do for you at Neo4j.com/developer

๐Ÿ“ฎ Ask a Question

If you enjoyed this episode and would like me to discuss a question that you have on the show, drop it over at: developertea.com.

๐Ÿ“ฎ Join the Discord

If you want to be a part of a supportive community of engineers (non-engineers welcome!) working to improve their lives and careers, join us on the Developer Tea Discord community by visiting https://developertea.com/discord today!

๐Ÿงก Leave a Review

If you're enjoying the show and want to support the content head over to iTunes and leave a review! It helps other developers discover the show and keep us focused on what matters to you.

Transcript (Generated by OpenAI Whisper)

In the last episode, we talked about the king of all overused statistics. This is one you're probably overusing on a regular basis. It is the average. And we talked a little bit about kind of the definition of the average and why it is so sticky, why it is so common and easy to jump to using averages. And the basic reason that we discussed in the last episode is that averages are an excellent heuristic to our brains. We imagine, sometimes correctly, that an average is somehow representative of the whole. The average, of course, being... the sum of all parts divided by the number of those parts. But our intuitive picture of what an average actually is requires a lot more investigation and caveats. Specifically, we imagine that averages are representative of the whole. And we translate this to imagine that if we were to pick one representative sample... Right? So if we had a population... of items, a population of people, a population of tasks, whatever... and we averaged those things on some metric or some particular value, some attribute... that if we were to pick some random item from that grouping... that the metric that we are averaging on is probably going to be somewhere near the average. So, for example, let's say that you use StoryPoint... in an Agile team. And you look at your population of tickets in your backlog that have StoryPoints attached to them. And you're looking at the average number. The average number of StoryPoints per ticket. So let's say it's... I don't know... Pull a number out of the sky. Let's say the average number of points on your tickets is 3. Well now, you're going to assume, intuitively, if you're like most people... that if you were to pick a random ticket from your backlog... that it's very likely to be 2, 3, 5, 4, whatever. Somewhere close to 3. But as we also discussed in the last episode... if you were to pick two arbitrary numbers, let's say 1 and 8... there is some population with a ratio of 1s and 8s... that will generate every discrete number between 1 and 8. Now, while this may not necessarily be possible... and it's practically useful... you're not trying to actually go and put this into practice... what it does tell you is that an average may be obscuring a lot of other information. And perhaps that our intuition is reaching for averages... when really we should be looking at distributions. Once again, going back to the previous assertion... that we internally, in our minds... convert averages to distributions by our natural intuition... about what an average is supposed to mean. That is, if we were to pick some random item from the population... that we are averaging on... we would expect that item to... in most cases, statistically speaking... we would expect it to be pretty close to that average. That expectation is a description of a normal distribution... or something like a normal distribution. There's a lot of other distributions that are similar to normal... and they all kind of track with a similar intuitive representation. Now, there's an important caveat... to this normal distribution being representative... or our intuitive understanding of average being representative of normal distribution. And this is one of the counter-intuitive things about averages... and that is that averages are sensitive to outliers. Averages are sensitive to outliers. So, if you were to look at a normal distribution... it looks like a bell curve. And what you're looking at with that bell curve... is the number of numbers... like a histogram. The number of numbers... or another way to think about it is... the probability that a given number... would show up in the set... that is represented by the normal distribution. So, in other words, if you had a hundred items... and they were all kind of normally distributed around the number 30... then it's more likely that you're going to get a 29... assuming that your standard deviation away from the norm... is like 5 or something like that. You're more likely to get a 29... than you are to get a 1. And so, in your group of a hundred items... you may have one single 1... but you might have 5 or 10 29s. And in any distribution... it's not impossible to get extreme outliers. So, in this case... where the normal distribution... the height of that normal distribution... the curve of the bell curve... is centered around something like 30... you could also have a single number that's... you know, 500 or 5,000, 5 million. And that would technically still... be representative of that distribution... because it is just a single outlier. If you have an extreme outlier like this... it has a massive impact on the average. And that's the number... simply because the average is a sum... divided by the total number of parts. So, if you have an outsized outlier... something like 5,000 or 5 million... then all of the other constituent parts... have much less of an effect... on the overall average... than does that 5 million. So, an extreme example of this... if you were to have 11 numbers... 10 of them are 1s... and one of them is 5 million... then your average is somewhere like 419. So, you're probably going to have... 454,000. We're going to take a quick sponsor break... and then we're going to come back and talk about... another counterintuitive thing about averages... that you are probably accidentally falling prey to... as it relates to your coworkers. We have to make assumptions in our work all the time. And if you're like most engineers... you probably at some point have assumed... that your data would fit a traditional table structure. But this isn't always the right structure for data. And in fact, you might be seeing these problems... without realizing it. If your code is getting dragged down... by a ton of joins and long query times... the problem may be that you're choosing... the wrong type of database. You could potentially simplify... and cut down on the complexity... by designing with graphs. A graph database lets you model data... the way it might look in the real world... instead of forcing it into rows and columns. Stop asking relational databases... to do more than they were made for. Graphs tend to work well... for use cases with lots of data connections... like supply chains, fraud detection... real-time analytics, generative AI... pretty much anything where you might have... a many-to-many relationship... a graph could probably help you. And with Neo4j, you can code... in your favorite programming language... and against any driver. Plus, it's easy to integrate into your existing tech stack. People are solving some of the world's... biggest problems with graphs today. And now, it's your turn. Visit neo4j.com slash developer... to get started. That's n-e-o-4-j dot com... slash developer. Thanks again to Neo4j... for sponsoring today's episode... of Developer Team. I want you to think of... the most outstanding... developer you've ever worked with. Or maybe the most outstanding... manager. Or really, you can think of anybody... that you would say is prominent... in their field. They are considered the top of the top. And I want you to... think about what you know them for. What do they excel at? What are they... Why are they so good? Why are they so well-known? As a general rule, your answer is going to... hone in on a very few... things that this person is known for. I want you to... take that same set of people... or that same person that you thought about... and try to imagine... something that that person... was not very good at. Maybe something that... was not in their skill set. And I want you to... imagine something that they were... even worse at. The mental model that I have for this... is trying to imagine Albert... Einstein playing basketball. Now I can't prove this theory... but I don't think that Albert Einstein... would have been a stellar basketball player. Certainly would not... be in competition with the best... of the best. And in fact... even in somewhat related fields... we see large disparities in performance. We'll stick with sports for a second. The overused... example here is... Michael Jordan playing baseball. Now what does this have to do with... averages? Well, in our daily work... especially if you are a manager... we tend to put a lot... of our focus into... the things that people... are not doing well. And this isn't just a negativity... bias thing coming up again. This is usually kind of the way that... performance management tends to work. And it's kind of the way that we think about... providing feedback to other people. Trying to find ways... to bring their... not as good qualities up to par... with the rest of them. This is once again kind of a flaw... of thinking about averages. Imagine you have someone who... has a few things that they are... absolutely excellent in. We're talking about somebody with a true... mastery of let's say a couple of... specific technologies. Maybe they have a full domain knowledge... front and back of what... your company does. Now the same person... may not be very good at providing... feedback in a synchronous environment. Or maybe they're not very... good at keeping track... of every single thing they do... in their PRs. Maybe their commit messages... are not as good as another... person on the teams. So you have someone who has an outstanding... ability in one area and has... some room to grow in another. And then let's imagine that you have somebody who is... kind of mediocre at all of that. They are... they provide sufficient commit messages... they know enough about the domain to get by. If you were to use... an average rating system... that combines... the ratings of various factors. Let's say in this case technical proficiency... and documentation... or you know... code hygiene if you want to call it that. If you were to rate both... of these people you may have one... that rates let's say on a scale of... 1 to 10, a 10 and a 3. And the other one is rating... like a 5 and a 6. These averages are pretty close to each other. And the average once... again does not tell the story because... it's taking... categorical information... and turning it into a combined... homogenous score. And what this tends to do... is it tends to... focus on those negative aspects... of a person's... performance to try to bring them... up to par with the rest of their... performance. And an alternative here would be... to focus on the areas where they have... the highest potential for maximizing. A simple adjustment... might include... providing the additional... context of max... scores. So instead of just presenting a... particular individual's average score... you may also explain... the areas where they truly excel... and stand out above the rest. So we imagine that we are comparing... apples to apples by looking at... averages between two... given workers. But we may be... missing a huge opportunity... by using... averages and reducing the information there. One more counter-intuitive... thing about averages that I want to... bring up here, and this is especially... pertinent for managers... is the concept of human... perception as it relates... to averages in their... experiences. So if you were to ask somebody... what was your year like? Or what was your week... like? It's very... likely that they will... try to answer you by... explaining certain things that... happened. Now... if you were to... clarify that you're asking... them about the average... it's likely that their answer might... change a little bit, but they still will... probably reach for anecdotal... evidence to explain... the answer that you're... looking for. And part of this is because... we're not perfect recording... machines. We don't have a perfect memory... of everything that happened in the... last week. And just to prove this... can you... recall, without using whatever... logging app you might have... every meal that you ate in the last week... or even every breakfast that you ate... in the last week. Most people... cannot do that. And so... instead of using average... for our... description of a given experience... we reach for a different... heuristic. And I want to explain... why this is the case. Why do we use... the heuristic of averages... when we're looking at a distribution of... objects. And we... don't use the same kind of... metric or same kind of... measure when we're thinking about our... experiences. And really... this comes down to availability. So when we think about... experiences, we think about things that are... memorable. Particularly... research has shown that we... rate our experiences... based on two particular measures. First, what was... the peak? And... second, what was the end? And in this case, peak... can be positive or negative. While end is relating to... the most recent thing that happened. And both of these heuristic... polls are kind of like... flavors of an availability heuristic. The thing that... you remember best... which is typically peak experiences. And then the thing that... happened most recently... which allows you to remember it more easily... as well. And we do this... throughout our lives. If you were to... think about various experiences that you... have stored in your memory... maybe from your childhood. And... it's not a perfect system. Of course, you might have... a few of those memories that stick out as... mundane moments and... who knows what those connections are. We are still kind of unpredictable... organic machines... in many ways. But for the most part... people are not going to describe their... lives based on the average. In fact, if we did, the picture might... look a little bit dismal. Because... at least a quarter of it or so... is going to be dedicated to... sleeping. So how does this translate to... the workplace? Well, if you're looking... at ways to... help your team feel more... engaged, you can think... about designing these... peak experiences. Perhaps the... easier thing to do is to design the... end experiences. For example... celebrating and celebrating well... when you have a key delivery. But this will also help you recognize... the critical nature... of individual events. Think about it. If you were to imagine... that all of your reports... if you're a manager, are going to judge... their experience based off of their... average experience... then you're thinking about it in the wrong... way. Your reports... are going to have the same kind of... psychological understanding... of their experience that you do. Through a peak-end lens. So... you shouldn't be surprised when... your report comes back to you talking... about that one experience once again. They might seem stuck... on it, but this is the way that humans... work. And once again... this is why those individual events... are so critically important. Being present... and being aware of those key moments... could make or break your... career as a manager, as... a tech leader. It could make or break your relationships with people. Focusing on... recognizing when a given moment... could be the way... that someone summarizes this entire... week. That will help you recognize the weight... of your responsibility in that moment. Thank you so much for listening to this... two-part series on averages. Hopefully, you can see... how all of this kind of ties back to... averages and the counter-intuitive things... about averages. Why they... are so overused and... when it's more appropriate to use something else. Hopefully, you walk away with... new tools, new mindset... about how to think about... summarizing, how to think about... availability heuristics... and of course, how to... use averages when they are appropriate. Thank you again to today's sponsor... Neo4j. Is your code getting dragged down by endless... joins and long query times? You can probably simplify it with... graphs. With Neo4j... you can code in your favorite programming language... against any driver. Go see what graphs can do for you at... Neo4j.com slash developer. That's N-E-O, the number four. J dot com... slash developer. If you enjoyed this episode... please come and join us on the DeveloperTea... Discord community. I'm in there. You can come and ask... me a question. Who knows? That question... might end up being the inspiration for... an episode. And if it is, I will... mention you if you are... inclined to that. If you will allow me... to mention you on air, I can do that. Thank you so much for listening and until next time... enjoy your tea. Captions by GetTranscribed.com