The Book of Why - My Thoughts on Causality in the 21st Century
30th April, 2025
I’ve probably said the word ‘causality’ more in these last couple months than I ever have before in my life. Chances are, with everything going on in the world, I’m not the only one.
As policymakers debate the true levers of growth and inflation in a globally uncertain economic environment and I mull over whether statistical and machine learning models are capable of discerning causal links between development indicators, it seems like a good time to visit the concept.
Prof. Srinivasan Keshav of the Energy and Environment Group at the Computer Laboratory recommended I check out Judea Pearl’s The Book of Why when I had first begun to think about how geospatial machine learning might help uncover causal influences on the ground.
What I found within was a fantastic retelling of the how academics and statisticians have thought about causality (or done their best to avoid the concept entirely), and how we can leverage the tools of the causal revolution to ask better questions and seek clearer answers.
To summarise Pearl’s key theses, which he drives in from the very beginning of the book:
- The human brain is the most sophisticated causal processing machine on the planet.
- We can place causal thinking on three hierarchical levels, collectively termed the ‘Ladder of Causation’. These correspond (from bottom to top) with the concepts of ‘association’, ‘intervention’, and ‘counterfactuals’.
- Data alone cannot answer casual enquiries. We require machines specifically constructed for understanding causal relationships to do this, and by doing so we can arrive at artificial general intelligence (AGI).
Pearl attacks the notation of causality head on, which traditionally statistics has cowered away from. I’ll avoid getting into the hairs of the maths presented in the book, but Pearl notably takes the step of differentiating the do operator, which explicitly encodes causation by forcing an event to occur, and the oft-seen conditional probability notation we’re all familiar with – doing instead of merely seeing.
In practice, however, to say that this is challenging would be an understatement. Counterfactuals inherently cannot be directly observed. Construction of effective controls representative of counterfactuals often requires knowledge of causative factors which isn’t available (if you need perfect understanding of existing causal links to make new ones, where do you begin!?) or is restricted by data availability.
So, in reality, if you wish to truly predict inflation from fundamentals, you would first need to create the universe from scratch (just as you would were you to bake an apple pie from scrtach…), track the deterministic behaviour of every elementary particle, and find a way to correct for quantum mechanical fluctuations.
As with everything, we settle for an adequate level of abstraction. The level of abstraction will limit the bounds around our answer, but also require us to process only a relatively finite quantity of information in reaching that answer.
I defer to macroeconomic examples because of both their current relevance and their immense (but often unseen) consequences on our individual lives.
Pearl tends to arrive at the conclusion that assessing associations within data alone – that is, staying on the first rung of association – is insufficient for causal analysis. And yet, many would argue that large-language models (LLMs) are capable of some degree of causal comprehension. Have they then climbed up these rungs without us noticing? Pearl himself has stated in recent interviews that what he didn’t account for was the possibility that the data that models are trained on may subtly contain causal relationships without them being explicitly coded in, as occurs with text in the case of LLMs.
If you’re wondering whether LLMs may be the first step towards true causal inference machines: both Pearl and I would push back on this being anywhere near a certainty. Traditional statistical models are not only up to the task of being the forerunners of causal inference but remain much more explainable than their neural network counterparts.
I can’t say I agree with everything in Pearl’s book. What I am quite sure of, however, is that a combination of these causality-informed approaches, traditional statistics, and cutting-edge deep learning approaches holds the keys to making it all the way up our ladder of causation.
The further we get into this, the larger the temptation goes to just say ‘screw causality, I’m happy with correlation’. What I agree strongest with Pearl on is that science and statistics should not shy away from causality because it is tough to explain, but should tackle it head on for that very same reason, especially with the technology that we are fortunate enough to have in today’s world.
In order to know what levers to pull or push at the policy level to optimise economic well-being while enhancing sustainability and health outcomes, we need the most sophisticated causal inference machine ever created, and we need policymakers to listen to it.
For a more applied look into working with causality in research contexts, I highly enjoyed reading Causal Inference: The Mixtape by Scott Cunningham, which builds up to the intuition behind difference-in-difference and synthetic control approaches, and discusses how these are actually applied in a variety of contexts.
Separately, I was luckily appointed as the de facto interviewer for those coming out of the VR supercomputer zone of the Cambridge festival.