datavolley/README.Rmd at master · rybaczewa/datavolley · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "tools/README-"
)
```

# datavolley

[![Travis-CI Build Status](https://travis-ci.org/raymondben/datavolley.svg?branch=master)](https://travis-ci.org/raymondben/datavolley)

An R package for reading DataVolley scouting files.

See also this [DataVolley file validator](https://apps.untan.gl/dvalidate/), which is built on the datavolley package.

## Installation

```{r eval=FALSE}
library(devtools)
install_github("raymondben/datavolley")
```

## Example

Read one of the example data files bundled with the package:
```{r}
library(datavolley)
x <- read_dv(dv_example_file(), insert_technical_timeouts=FALSE)
summary(x)
```

Number of serves by team:

```{r}
serve_idx <- find_serves(plays(x))
table(plays(x)$team[serve_idx])
```

Distribution of serve run lengths:

```{r}
serve_run_info <- find_runs(plays(x)[serve_idx,])
table(unique(serve_run_info[,c("run_id","run_length")])$run_length)
```

The court position associated with each action can be recorded in two ways. The most common is by zones (numbered 1-9).

Heatmap of attack rate by court zone:

```{r}
library(ggplot2)
library(dplyr)
## calculate attack frequency by zone, per team
attack_rate <- plays(x) %>% dplyr::filter(skill=="Attack") %>%
    group_by(team, start_zone) %>% dplyr::summarize(n_attacks=n()) %>%
    mutate(rate=n_attacks/sum(n_attacks)) %>% ungroup

## add x,y coordinates associated with the zones
attack_rate <- cbind(attack_rate, dv_xy(attack_rate$start_zone, end="lower"))

## for team 2, these need to be on the top half of the diagram
tm2i <- attack_rate$team==teams(x)[2]
attack_rate[tm2i, c("x", "y")] <- dv_flip_xy(attack_rate[tm2i, c("x", "y")])

ggplot(attack_rate, aes(x, y, fill=rate)) + geom_tile() + ggcourt(labels=teams(x)) +
    scale_fill_gradient2(name="Attack rate")
```

Or using arrows to show the starting and ending zones of attacks:

```{r}
## first tabulate attacks by starting and ending zone
attack_rate <- plays(x) %>% dplyr::filter(team==teams(x)[1] & skill=="Attack") %>%
  group_by(start_zone, end_zone) %>% tally() %>% ungroup

## convert counts to rates
attack_rate$rate <- attack_rate$n/sum(attack_rate$n)

## discard zones with zero attacks or missing location information
attack_rate <- attack_rate %>% dplyr::filter(rate>0 & !is.na(start_zone) & !is.na(end_zone))

## add starting x,y coordinates
attack_rate <- cbind(attack_rate, dv_xy(attack_rate$start_zone, end="lower", xynames=c("sx","sy")))

## and ending x,y coordinates
attack_rate <- cbind(attack_rate, dv_xy(attack_rate$end_zone, end="upper", xynames=c("ex","ey")))

## plot in reverse order so largest arrows are on the bottom
attack_rate <- attack_rate %>% dplyr::arrange(desc(rate))

p <- ggplot(attack_rate,aes(x,y,col=rate)) + ggcourt(labels=c(teams(x)[1],""))
for (n in 1:nrow(attack_rate))
    p <- p + geom_path(data=data.frame(x=c(attack_rate$sx[n], attack_rate$ex[n]),
                                       y=c(attack_rate$sy[n],attack_rate$ey[n]),
                                       rate=attack_rate$rate[n]),
        aes(size=rate), lineend="round", arrow=arrow(ends="last", type="closed"))
p + scale_fill_gradient(name="Attack rate") + guides(size="none")
```

The second source of position data is court coordinates. These are not included in all data files, because generally they must be manually entered by the scout and this can be a time consuming process. For the purposes of demonstration, here we generate fake coordinate data:

```{r warning=FALSE}
## take just the serves from the play-by-play data
xserves <- subset(plays(x), skill=="Serve")

## if the file had been scouted with coordinate included, we could plot them directly
## this file has no coordinates, so we'll fake some up for demo purposes
coords <- dv_fake_coordinates("serve", xserves$evaluation)
xserves[, c("start_coordinate", "start_coordinate_x", "start_coordinate_y",
            "end_coordinate", "end_coordinate_x", "end_coordinate_y")] <- coords

## now we can plot these
xserves$evaluation[!xserves$evaluation %in% c("Ace", "Error")] <- "Other"

ggplot(xserves, aes(start_coordinate_x, start_coordinate_y,
       xend=end_coordinate_x, yend=end_coordinate_y, colour=evaluation))+
    geom_segment() + geom_point() +
    scale_colour_manual(values=c(Ace="limegreen", Error="firebrick", Other="dodgerblue")) +
    ggcourt(labels=c("Serving team", "Receiving team"))
```

We could also use these coordinates to generate a heatmap-style plot of serve location:

```{r warning=FALSE}
ggplot(xserves, aes(start_coordinate_x, start_coordinate_y))+
    stat_density_2d(geom="raster", aes(fill=..density..), contour=FALSE)+
    scale_fill_distiller(palette=1, direction=1)+
    ggcourt("lower", labels="Serving team")
```

## Troubleshooting

If you see unexpected behaviour, try `read_dv(...,do_warn=TRUE)` to obtain more diagnostic information during the process of reading and parsing the DataVolley file. Also check the text encoding specified to `read_dv` (did you specify one??)


## More

For more information about DataVolley, see http://www.dataproject.com/IT/en/Volleyball.