April 15, 2018

Learning about kicking in the AFL

Today I’m going to have a look at kicking statistics in one of the greatest games on Earth, which is of course Australian Rules Football.

library(tidyverse) # For everything
library(ggthemes) # For some prettier themes

Data

Big thanks to DFSAustralia for the script that helped me pull this from AFL Tables.

d <- read_csv("data/afl_stats_2018.csv")
glimpse(d)
## Observations: 1,188
## Variables: 22
## $ YR          <int> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 20...
## $ RD          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ Player.Name <chr> "Astbury, David", "Bolton, Shai", "Butler, Dan", "...
## $ TM          <chr> "RICH", "RICH", "RICH", "RICH", "RICH", "RICH", "R...
## $ OPP         <chr> "CARL", "CARL", "CARL", "CARL", "CARL", "CARL", "C...
## $ K           <int> 9, 3, 10, 13, 7, 9, 14, 5, 11, 6, 7, 7, 4, 7, 20, ...
## $ M           <int> 7, 2, 2, 5, 4, 0, 2, 3, 3, 2, 0, 4, 1, 1, 5, 5, 2,...
## $ H           <int> 7, 3, 3, 6, 8, 7, 10, 11, 7, 6, 6, 10, 2, 7, 12, 0...
## $ G           <int> 0, 0, 3, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,...
## $ B           <int> 0, 0, 0, 2, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 3, 1, 0,...
## $ HO          <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33...
## $ T           <int> 2, 0, 7, 2, 4, 7, 4, 4, 3, 4, 13, 2, 2, 2, 1, 1, 5...
## $ CLG         <int> 2, 3, 3, 2, 0, 7, 2, 1, 5, 2, 0, 4, 1, 1, 3, 5, 5,...
## $ FF          <int> 1, 0, 3, 5, 1, 0, 2, 0, 0, 0, 1, 0, 0, 1, 1, 0, 2,...
## $ FA          <int> 1, 3, 1, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 5,...
## $ TOG         <int> 100, 77, 76, 80, 76, 77, 81, 75, 79, 72, 69, 83, 8...
## $ Ground      <chr> "Melbourne Cricket Ground", "Melbourne Cricket Gro...
## $ MB          <int> 74, 15, 90, 108, 85, 48, 86, 62, 57, 48, 89, 54, 2...
## $ DS          <int> 68, 13, 88, 99, 79, 63, 86, 62, 69, 52, 87, 59, 27...
## $ Score       <int> 121, 121, 121, 121, 121, 121, 121, 121, 121, 121, ...
## $ Margin      <int> 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26...
## $ Total       <int> 216, 216, 216, 216, 216, 216, 216, 216, 216, 216, ...

For now, I am interested in Kicks, which I am going to guess is represented as “K”.

Exploratory Visuals

I’m going to run some plots to get a feel for the data. Because I feel like stirring the pot, in some I will compare Power and Crows to the rest of the competition by giving each a unique colour.

I’m selecting my custom colours from this site.

d$cust_col <- ifelse(d$TM == "PORT", "PORT",
                     ifelse(d$TM == "ADEL", "ADEL", "OTHERS"))

Kicks per player

ggplot(data = d, aes(x = K)) +
geom_histogram(binwidth = 1) +
  theme_tufte() +
  labs(title = "Kicks per player", x = "Kicks" , y = "Frequency",
       subtitle = "10 a game is probably fair, 20+ is champion",
       caption = expression(paste(italic("Source: 2018 Round 1-3, afltables.com"))))

ggplot(data = d, aes(x = K)) +
geom_histogram(binwidth = 1) +
  facet_grid(. ~ RD) +
  theme_tufte() +
  labs(title = "Kicks per player, by round", x = "Team" , y = "Kicks ",
       subtitle = "Maybe more kicks overall in round 2 and 3?",
       caption = expression(paste(italic("Source: 2018 Round 1-3, afltables.com"))))

ggplot(data = d, aes(x = factor(YR), y = K)) +
geom_boxplot() +
  theme_tufte() +
  labs(title = "Kicks per player - distribution", x = "Year" , y = "Kicks ",
       subtitle = "Median is 10 (50% above and below)",
       caption = expression(paste(italic("Source: 2018 Round 1-3, afltables.com"))))

ggplot(data = d, aes(x = factor(YR), y = K, col = cust_col)) +
geom_boxplot() +
  scale_color_manual(values = c("#4682b4","#8b8589", "#008080")) +
  theme_tufte() +
  labs(title = "Kicks per player - distribution", x = "Year" , y = "Kicks ",
       subtitle = "Port and Crows higher median than the rest",
       caption = expression(paste(italic("Source: 2018 Round 1-3, afltables.com"))))

ggplot(data = d, aes(x = factor(RD), y = K)) +
geom_boxplot() +
    facet_grid(. ~ RD) +
  theme_tufte() +
  labs(title = "Kicks per player - distribution", x = "Round" , y = "Kicks ",
       subtitle = "Round 3 overall higher proportion of more kicks, round 2 most champions",
       caption = expression(paste(italic("Source: 2018 Round 1-3, afltables.com"))))

ggplot(data = d, aes(x = factor(RD), y = K, col = cust_col)) +
geom_boxplot() +
    facet_grid(. ~ cust_col) +
  scale_color_manual(values = c("#4682b4","#8b8589", "#008080")) +
  theme_tufte() +
  labs(title = "Kicks per player - distribution", x = "Round" , y = "Kicks ",
       subtitle = "Crows might be trending up, Port trending down (median)",
       caption = expression(paste(italic("Source: 2018 Round 1-3, afltables.com"))))

ggplot(data = d, aes(x = TM, y = K, col = cust_col)) +
  geom_point(alpha = 0.2) +
  geom_boxplot() +
  scale_color_manual(values = c("#4682b4","#8b8589", "#008080")) +
  theme_tufte() +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))  +
  labs(title = "Which team's are kicking the most?", x = "Team" , y = "Kicks",
       subtitle = "Gold Coast Suns (GCFC) also a high kicking team",
       caption = expression(paste(italic("Source: 2018 Round 1-3, afltables.com"))))

Kicks and scoring

I realise many kicks are not shots at goal, but I wanted to see the relationship between kicking and scoring in general.

total_kicks <- d %>%
  group_by(TM, cust_col, RD, Score ) %>%
  summarise(Total_Kicks = sum(K, na.rm = TRUE))

ggplot(data = total_kicks, aes(x = Total_Kicks, y = Score, group = TM, col = cust_col)) +
  geom_point(alpha = 0.5) +
  scale_color_manual(values = c("#4682b4", "#8b8589", "#008080")) +
  geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
  theme_tufte() +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))  +
  labs(title = "Number of Kicks and Score", x = "Total Kicks" , y = "Score",
       subtitle = "Adelaide teams seem to use kicking effectively",
       caption = expression(paste(italic("Source: 2018 Round 1-3, afltables.com"))))

© Will Bidstrup 2018