CORRELATION v CAUSATION: The Football Analyst’s Trap
Putting the numbers up on the whiteboard for the half time break is a regular sighting and staple at a game football being played at any level. At grassroots level the common statistician is a fairly rare sighting, so you might see a whiteboard containing the most simple and easy-to-count statistics recorded to provide an indication of how hard the team is working – smothers, shepherds, chases, etc. With just a little more statistical firepower at their disposal, AFL clubs are more able to use intricate and less directly observable metrics. Think post-clearance contested possession differential, D50-to-I50% (transition rate) and effective clearance %.
While there is significant contrast in how the stats whiteboard is used at grassroots and at AFL level, it serves a common purpose: to allow for a team’s performance to be measured according to a pre-defined set of metrics or characteristics. In this sense, the general idea of a Key Performance Indicator (KPI) is to provide a benchmark by which teams can give themselves a pass or fail mark that is separate to what you may see on the scoreboard.
I like to think of the KPI whiteboard as the most directly observable interaction between sports coaching and sports analytics. It allows for the use of statistical data – no matter how simple or complex – to provide direct evidence of in-game trends that allow for better and more-informed decision-making by a team’s coaching staff. In an analytical role within a football club one’s purpose is to aid in decision-making processes through the use of data, and the KPI whiteboard is perhaps the more direct and effective way to communicate findings in-game.
Finding a set of well-defined and appropriate KPIs can be a crucially important task in preparation for a match of football, but they must be exactly that – well-defined and appropriate. A team which relies on fast and effective ball movement needs to use KPIs which reflect that (e.g. transition %) while a team which relies on winning the football at the coalface should use an entirely different metric (e.g. clearance differential).
A good footballing KPI in my eyes should meet the following criteria:
- Should be relevant to the team’s game style and speak directly to the parts of the game plan that are most important. For example, sides that thrive in the stoppages should have clearance/contested ball related metrics, while sides that win games through slick ball movement should include KPIs like D50-I50% (transition %).
- KPIs should be relatively simple to understand. They don’t necessarily need to be the most basic of statistics, but the audience for which they are intended (coaches, players, analysts) need to understand what they are.
- They need to be capable of detecting changes in level of performance over time. That is, if they remain consistent through form slumps and hot patches, they cannot be used to effectively measure performance.
- Finally, across a set of multiple KPIs, each should measure and focus of something unique and distinctly different to other KPIs. Using multiple metrics that measure the same thing should for the most part generally be avoided.
Given the ridiculous abundance of football data that has become more widely available in the past few years, it now means that for every well-defined KPI a team may use, there are a dozen wildly inappropriate stats that may appear to be solid indicators of a team’s performance, but in reality should not ever be considered as a useful metric by which to define a team’s performance.
This leads in to one of the fundamental and most-discussed arguments in data analytics: correlation versus causation.
Correlation refers simply to the relationship between two variables. In this context, the variables we are referring to are two distinguishable football statistics or metrics which we can compare to one another to look for any form of relationship.
Causation or causality means that one given event can be said to definitely cause another given event to occur. Generally the existence of causality is much more difficult to prove than it is for correlation.
The most important thing to know and remember in all of this is that correlation does not imply causation. In essence, just because a certain degree of correlation may exist between two sets of data, it does not necessarily mean there is a definitive causal effect from one onto the other.
Here’s a visual representation which I think does it pretty good justice, courtesy of my Twitter feed.
To demonstrate what I am talking about, I decided to dig through data from the 2020 season to see if I could find any instances of a team having a highly correlated relationship between a given statistic and the margin of their matches that would suggest there is a causal effect. The catch here is that I am exclusively looking for stats that would actually have little to no causal effect on match margins – as a means of proving that correlation does not imply causation. In this sense I am looking for bad KPIs which would have the potential to fool a less-than-wise football analyst.
For contrast, I’ll also discuss and dig up a few examples of good KPIs in the footballing context and highlight the difference in why they are deemed appropriate where others are not.
Disclaimer: I am aware that simply correlating a statistic with the match margin is a crude way to infer any level of causality, especially over the span of 2020’s shortened 17-match season, but since the general purpose of this post is to demonstrate what isn’t a good KPI, I thought I could probably get away with it.
Correlation? Yes. Causality? No. (Bad KPIs)
AFL Fantasy Score v Match Margin
To kick things off I’ll share a bit of a strange relationship I found which shone through in Collingwood’s 2020 data – an unbelievably strong correlation between Collingwood’s total combined AFL Fantasy Score and the final margin of their matches.
Aside from one slightly deviant match against the Hawks (the lone black point away from the dashed line), all 16 of Collingwood’s other games fit to a near-perfect linear trend line as depicted above.
For a side like the Pies, who play a high possession brand of football which relies on maintaining control of the football, one would expect to see some level of correlation between their margins and a metric like AFL Fantasy which is weighted heavily towards greater numbers of kicks, marks and handballs, however a correlation of 0.900 (compared to the much more sensible-sounding league-wide figure of 0.678) is far beyond what I would have expected.
So if there’s such a strong correlation between AFL Fantasy points and the margin of Collingwood’s matches, why is this a bad KPI? In short, using an arbitrary figure like a AFL Fantasy points (a linear combination of common countable football statistics) doesn’t provide any detailed and observable insights into how the game is being played. It may give a high level view of which team has had more possession of the ball, but beyond that it can’t tell the coaching staff anything of substance. After all, would anyone know straight off the top of their head if 1400 AFL Fantasy points is a good return for a match? Nope.
A good KPI should not only have some statistically measurable impact on a desirable outcome (e.g. in this case correlation with the margin of the match), but should also provide insight into the way the game (or part of the game) is being played.
Still – it is pretty interesting to see that when the Pies scored more than 1300 AFL Fantasy points in a match during 2020 their win-loss record stood at 8-2, while in the matches where they didn’t hit the 1300 mark their record was a much dimmer 1-1-5. Should I get Bucks on the phone?
Long Down the Line Kicks vs Margin
Our next case study of not-so-useful KPIs is St Kilda’s correlation between the number of times they kicked long down the line during a match and said match’s margin.
An AFL-wide correlation of 0.128 would suggest that there is nearly no relationship between this statistic and the match margin, however isolating the data to St Kilda’s matches results in a correlation of 0.731.
A win-loss record of 9-2 when sending it down the line at least 10 times during the match (1-5 otherwise) would suggest the Saints should employ this tactic at every possible chance – but this is the point where one needs to dig into why this wacky correlation might exist.
As mentioned earlier, a 17-game H&A season leaves us with only 17 points of data when assessing full matches. To get an idea of true correlation between two sets of data 17 observations just isn’t enough. Breaking the data down into halves, quarters which would provide us with more data but would also increase the amount of random variance in the data. We could also include additional data by extending the timeframe so that it spans across multiple seasons, but that may bring in unwanted extraneous variables as a result of a shift in team dynamics, game plans, personnel etc.
While there is some chance that kicking long down the line is a part of St Kilda’s game plan, the existence of this degree of correlation is more likely due to random chance and having insufficient data to dissect.
The important take-away from this example is that sample sizes matter – a lesson that is all too lost on the common football punter and amateur analyst alike.
Free Kicks Against v Margin
To finish up with the bad KPIs, I’ve thrown in a downright absurd one just for good measure. In this case, we’re looking at the number of free kicks Gold Coast gives away in a match, which surprisingly has a positive linear trend. Wait, what? This of course is completely counter-intuitive as it suggests giving away more free kicks helps the Suns win games.
As with St Kilda’s long down the line example, the moderate-strength positive relationship (0.646) that exists between the number of free kicks Gold Coast gave away in a match throughout 2020 and their match margins can likely be put down to having too small of a sample size aided by a healthy dose of random variability.
Using the number of free kicks conceded as a KPI though would obviously be outrageous. Even a total peanut could tell you that giving away more free kicks won’t win you games of football and I’m sure it’s not a part of Stuey Dew’s master game plan either.
To summarise for both this example as well as the previous two – while a correlation may exist between two sets of data, it does not necessarily mean there is a cause-and-effect type relationship.
Correlation? Yes. Causality? Possibly. (Better KPIs)
Moving on now, as a means of contrasting against the previous examples of bad KPIs I thought I’d shine a light on a few examples of statistics which could serve as decent KPIs for certain teams. I say possible causation in the heading because as discussed previously, just because a correlation exists one cannot directly infer causality – however none of these following statistics fall into the same logical traps that the previous examples did (i.e. too arbitrary, irrelevant, counter-intuitive).
Small sample sizes still pose as a legitimate threat to the validity of the findings here, however some degree of confidence may be taken from the right KPI – even with only a small sample (which is a challenge that football clubs must constantly deal with given there are only a max of 22 games in a H&A season).
Post-Clearance Disposals vs Margin
Here is a metric which I personally think makes a bit more sense to use as a KPI.
Port Adelaide ranked 4th for disposals and 1st for clearances throughout Season 2020 – their game plan is tied to winning the footy at the coalface and maintaining possession on the outside in order to control the flow of the game. Using a KPI that reflects this would be a sensible move, and post clearance disposals is a pretty direct assessment of to what extent a team is holding possession once the ball is clear of a stoppage.
You can also see the Power’s margins possess a strong relationship with their post-clearance disposal numbers (0.816 v 0.426 league-wide). It is a nice little result to see here as well that Port’s three worst returns during 2020 in terms of post-clearance disposals were their only three H&A losses to Geelong, Brisbane and St Kilda.
To me, what makes this a solid KPI is that it directly measures a part of Port Adelaide’s game strategy that is clearly of importance to them, and it does so in a way that is not arbitrary, difficult to interpret or irrelevant. As mentioned, Port are also a side that loves winning stoppages, so throwing in a KPI like clearance differential would complement post-clearance disposals very nicely.
Uncontested Marks from Team Differential vs Margin
Uncontested marks retaining possession is a excellent indicator of how well a team is spreading from the contest, finding space, working hard to create options and moving the ball with freedom. North Melbourne clearly struggled throughout 2020, and this statistic may shine some light onto one of the reasons why.
North’s average margin when winning the uncontested mark count (happened on 8 occasions) was a respectable +6.3, however when they lost the count (9 occasions) that figure fell to a dismal -44.1 points. While this statistic alone may not be enough to diagnose exactly what went wrong for the Roos in 2020, it certainly paints some kind of a picture as to where things need to be improve.
This is why I like uncontested marks retaining possession as a KPI for the 2020 Roos. It identifies an exact part of their game plan upon which scoreboard success appears (key word) to be somewhat dependent. A strong KPI such as this allows North Melbourne’s analytical and coaching staff to make more informed and assured decisions based on a metric which is clear in what it is trying to identify. As with Port Adelaide’s example around post-clearance disposal numbers, this KPI specifically targets one single aspect of North’s entire game plan, giving applicable insight into one facet of the very complex game that is football, which is a big plus for mine.
Post-Clearance Contested Possession Differential vs Margin
For this last one I’m covering two teams simultaneously because I feel the metric in question does a solid job as a KPI for both the Dockers and the Hawks.
Post-clearance contested possession differential (mouthful, I know so let’s called it PCCPD for short) is one of the more useful statistics for teams at AFL level in my opinion as it does a very solid job of measuring something that is difficult to measure – how hard a team is working when they don’t have possession of the football (and more specifically, when the ball is in dispute).
Use the buttons to the right of the plot to filter by each of the two teams.
Both Fremantle and Hawthorn showed decent degrees of correlation with PCCPD throughout 2020 (0.765 and 0.706 respectively), although I will again point out that only having 17 points of data for each team makes the waters a little bit murky in this space.
Neither side were a prolific clearance side in 2020 (Free 14th, Hawks 18th), which means in they were generally more reliant on winning the ball back from the opposition post-clearance – a game style which is reliant on contested football after a clearance has already occurred. Through this lens you can see why PCCPD had a higher correlation with match margins for these two teams than across the competition, and therefore why it might stand to be a valuable KPI.
PCCPD fits well for the Dockers and the Hawks in this regard, and the proof is in the pudding when you consider that they respectively recorded 4-1 and 4-2 win loss records when winning the PCCP count – very solid for two teams that both finished outside the top 8.
A reminder – while I am considering these previous three examples as being more appropriate and relevant KPIs to be used by their respective teams, it still doesn’t necessarily mean there is causality between each metric and winning football games. We can see that each is correlated with the match margin, however we still can only infer that there is some causal link in each case.
Summary
Finding the right KPI to assess a team’s performance clearly isn’t always as straight-forward of a task as one might think. The smarter minds who are in the know within football clubs will generally have a fair idea of what to look for in advance based on a deeper understanding of their team’s game plan, but doing a little bit of statistical analysis in order to help identify decent metrics to use as KPIs is definitely a worthwhile exercise, particularly for someone like myself who thinks numerically more than anything else.
As a general code, it is probably smarter to search for causality rather than searching for correlation when identifying suitable KPIs, although this job is simpler said than done. As seen by the first three examples, correlation may exist between variables which do not reflect a causal relationship, and may even result in counter-intuitive findings – as seen with Gold Coast’s backwards relationship between free kicks conceded and their match margins. Having said this, finding evidence of a relationship can still be a useful step to take, as it can help fine-tune and refine KPI selection so that the most appropriate metric is used to assess team performance.
Overall, it is important to remember that there are traps which only serve to fool amateur analysts into believing in non-existent relationships – Don’t choose KPIs that are:
- Too arbitrary, vague or difficult to interpret (e.g. Collingwood’s AFL Fantasy Score)
- Based on a too small of a sample size (e.g. 17 observations from 2020’s shortened season – although I somewhat turned a blind eye for this post)
- Do not accurately reflect or measure success within a specific part of a team’s game style.
And most importantly for one final time, if you only take away from this post let it be that even in the most compelling of cases, correlation does not necessarily imply causation.