Correlation, Causation, Post Hoc, Ergo Propter Hoc
Published 5/13/2020
When you think correlation and causation you might think of graphs and research. As one data point changes another data point also changes.
In today's episode, we're talking about controlling variables and how to avoid assuming that correlation is causation.
🧡 Leave a Review
If you're enjoying the show and want to support the content head over to iTunes and leave a review! It helps other developers discover the show and keep us focused on what matters to you.
🍵 Subscribe to the Tea Break Challenge
This is a daily challenge designed help you become more self-aware and be a better developer so you can have a positive impact on the people around you. Check it out and give it a try at https://www.teabreakchallenge.com/.
Transcript (Generated by OpenAI Whisper)
Correlation is not causation. We've all heard this, but most of us actually twist this. We twist it to mean that correlation data does not mean causation, and this twisting can confuse us. In today's short episode of Developer Tea, we're going to talk about that, how we get twisted around with thinking about this particular kind of mantra about correlation and causation. And specifically, we're going to talk about a fallacy, or I guess it's kind of a bias, that all of us have. That's very hard to kind of unwire in what we can do about it. My name is Jonathan Gattral, you're listening to Developer Tea, and my goal on this show is to help driven developers like you find clarity, perspective, and purpose in their careers. I want to say a huge thank you to those of you who have subscribed to this podcast and those who have left reviews. This is hugely helpful to keep Developer Teagoing, but it's also incredibly personally meaningful to me. So thank you so much. Okay, so correlation and causation. When you think of this, you probably think of a graph, or a P value, something in research, in like a research paper, a white paper. And this is a very common place to find this discussion, because when we talk about correlation, we're essentially talking about two curves on a graph, for example, that look the same. Two coincident points of data. In other words, as one changes, another one changes in a relatively stable or predictable way. And this is kind of the technical definition of correlation. And when we talk about causation, we're talking about creating a connection that says I can meaningfully kind of manipulate one of the variables and cause the other one to change. The causation, the causal connection from variable A to variable B means that I can kind of control variable B, let's say variable B is difficult to control variable directly. You don't really have direct control over it, but you do have direct control over variable A, and you find a correlation between the two. Well, it seems possible that if you control variable A, then variable B is in effect, something you now have control over. But this connection may not be causal. And this is beat into the heads of researchers, of students in grad school, and certainly of students in undergrad school as well. All of us understand this idea most likely that correlation is not equal to causation. But here's what we often miss. Correlation is not just about two points on a graph. It's not just about measurable data or variables. One is also visible in things like time. For example, you may have heard of the fallacy post-hoc, ergo-proptor-hoc, or the post-hoc fallacy for short. This phrase, which I wouldn't expect you to know unless you've heard it before, is a Latin phrase. It means after this, therefore because of this, post-hoc after this, ergo because, proctor-hoc because of this. Now, did you catch the words in there that map to that same mental model of correlation and causation? After is our correlation. After means there's some connection. That's correlation. And proctor-hoc, or because of that's causation. And so we fall victim to this trap all the time as developers. For example, when you go to debug an issue in, let's say, your debugging something in your production server, what is the first thing you do? Well, you try to find out. When did this happen? You go and look at logs to see what happened before. You look at commits around the same time. Now, this is a reasonable approach. Don't get me wrong. Of course, that kind of debugging strategy is likely to lead you to the right thing. But we have to be incredibly careful when we apply this kind of logic as a blanket understanding of events as they occur. There's two main problems here. The first one is that those things may be disconnected entirely. If you find an event that occurred right before your particular error or before the bug showed up, then you might attach to that and think for sure that those things are causally connected. That this event is kind of the driving cause of the issue that you're trying to track down. And they may not be connected at all. The other problem that we have with this is that we can focus really intently, even if those things are connected, we can end up hyper focusing on that thing that happened before as the ultimate cause of the thing that happened after. But we forget that it's very likely that there is a deeper cause. And this is kind of the final link in a long chain. You might have found that final link that broke, but did you find the thing that was putting pressure on the chain at the top level? What is the actual causal connection that you should address? Ultimately, my hope for you to take away from this episode is to recognize that pattern of thinking. X happened, therefore, why must be true? I want you to question that leap, the leap from X to Y. Is there a logical and complete reason to believe that? Or is it something that's based on your gut? I'm not telling you to throw the gut-based answer away, but I'm telling you to label it for what it is. So then when it comes time to throw it away, you can let it go a little bit easier. Thanks so much for listening to today's episode of Developer Tea. If you found this episode helpful to the way that you think, to the way that you see your work, or maybe you think that this is a useful way of thinking about this fallacy, I'd encourage you to share this episode with someone else that you think would be interested in this. It's no secret that a personal recommendation is one of the most powerful ways to share content with other people, and that will help this show continue to reach more developers like you. Thank you so much for listening to today's episode. Today's episode was produced by Sarah Jackson for SPEC.FM. Head over to SPEC.FM and find other podcasts like this one that can help you in your developer career. Thanks so much for listening. This is Jonathan Cutrell and until next time, enjoy your tea.