Analysis of Directional Data

Introduction

Examples

Wish to analyze data in which response is a “direction”:

2d directional data are called circular data
3d directional data are called spherical data
not all “directional” data are directions in the usual sense
“directional” data may also arise in higher dimensions

Wind Directions

Recorded at Col de la Roa, Italian Alps
n = 310 (first 40 listed below)
Radians, clockwise from north
Source: Agostinelli (CSDA 2007); also R package circular

Data

6.23	1.03	0.15	0.72	2.20
0.46	0.63	1.45	0.37	1.95
0.08	0.15	0.33	0.09	0.09
6.23	0.05	6.14	6.28	6.17
6.24	6.02	6.14	6.25	0.01
5.38	5.30	5.63	0.77	1.34
6.14	0.22	6.23	2.33	3.61
0.49	6.12	0.01	0.00	0.46

Plot

Arrival Times at an ICU

24-hour clock times (format hrs.mins)
n = 254 (first 32 listed below)
Source: Cox & Lewis (1966); also Fisher (1993) and R package circular

Data

11.00	17.00	23.15	10.00
12.00	8.45	16.00	10.00
15.30	20.20	4.00	12.00
2.20	12.00	5.30	7.30
12.00	16.00	16.00	1.30
11.05	16.00	19.00	17.45
20.20	21.00	12.00	12.00
18.00	22.00	22.00	22.05

Plot

Primate Vertebrae

Orientation of left superior facet of last lumbar vertebra in humans, gorillas, and chimpanzees
Source: Keifer (2005 UF Anthropology MA Thesis)

Plot of Human Data

Butterfly Migrations

Direction of travel observed for 2649 migrating butterflies in Florida
Source: Thomas J Walker, University of Florida, Dept of Entomology and Nematology
Other variables:
- site: 23 locations in Florida
- observer: Thomas Walker (tw) or James J. Whitesell (jw)
- species: cloudless sulphur (cs), gulf fritillary (gf), long-tailed skipper (lt)
- distance to coast (km)
- date and time of observation
- percentage of sky free of clouds
- quality of sunlight: (b)right, (h)aze, (o)bstructed, (p)artly obstructed
- presence/absence and direction (N, NE, E, SE, S, SW, W, NW) of wind
- temperature

Why is the Analysis of Directional Data Different?

First three observations from the wind directions data: src_R{paste(round(wind[1:3], 2), collapse=”, “)}
The mean of these three numbers is src_R{round(mean(wind[1:3]), 2)} {{{results(2.47)}}}
What do you think?

Graphical Display of Directional Data

Graphical Display of Circular Data (in R)

Have already seen simple dot plots for circular data, e.g., for the wind data:

<<windConvert>>
<<windDataPlot>>

Graphical Display of Circular Data (in R) (ctd)

and for the ICU data:

<<icuDataPlot>>

and one more …

Graphical Display of Circular Data (in R) (ctd)

<<antsDataPlot>>

Circular Histograms

Circular histograms exist (see Fisher and Mardia and Jupp) but is there a ready-made function in R?

Rose Diagrams

Invented by Florence Nightingale (elected first female member of the Royals Statistical Society in 1859; honorary member of ASA)
Nightingale’s rose in R (see also this post and the R graph catalog)
Note that radii of segments are proportional to square root of the frequencies (counts), so that areas are proportional to frequencies. Is this the right thing to do?
Rose diagrams suffer from the same problems as histograms. The impression conveyed may depend strongly on:
- the binwidth of the cells
- the choice of starting point for the bins

Adding a Rose Diagram to the Plot of Wind Directions

rose.diag(windc, bins=16, col="darkgrey",
          cex=1.5, prop=1.35, add=TRUE)

Adding a Rose Diagram to the Plot of Wind Directions

Changing the Binwidth

Fewer/Wider Bins

Narrow Bins

Changing the Radii

I think that the default “radii proportional to counts” is generally best, but this is not always obvious. The scale certainly makes a big difference however.

rose.diag(windc, bins=16, col="darkgrey",
          radii.scale="linear",
          cex=1.5, prop=2.4, add=TRUE)

Changing the Radii

Kernel Density Estimates

lines(density.circular(windc, bw=40), lwd=2, lty=1)

Kernel Density Estimates

Spherical Data

Are there any canned routines for plotting spherical data in R?

Basic Summary Statistics

Mean Direction and Mean Resultant Length

First three observations from the wind directions data:

theta	x	y
6.23	-0.06	1.00
1.03	0.86	0.51
0.15	0.15	0.99

resultant (sum of direction vectors): (src_R{round(xsum, 3)}, src_R{round(ysum, 3)})
mean vector: \((\bar{x}, \bar{y}) = \) (src_R{round(xbar, 3)}, src_R{round(ybar, 3)})
resultant length (Euclidean norm of resultant): R = src_R{round(resultantLength, 3)}
mean resultant length: \(\bar{R} = \) src_R{round(meanResultantLength, 3)}
mean direction: \((\bar{x}, \bar{y})/\bar{R} = \) (src_R{round(meanDirection[1], 3)}, src_R{round(meanDirection[2], 3)})
\(˜{θ} = \) src_R{round(meanDirectionRadians, 3)}

Plot

Aside: Generating from the Uniform Distribution on the Sphere

Generating Random Points on the Sphere

Wish to generate a random “direction” in d-dimensions; i.e., an observation from the uniform distribution in the \(d-1\) sphere.
Usual way: let X ∼ N_d(0, I) and return U = X/||X||.
An alternative rejection sampler:
- Repeat until ||X|| <= 1
  - Let X be uniformly distributed on the cube [-1,1]^d
- Return U = X/||X||
What is the acceptance rate for the rejection sampler:
- Volume of the \(d - 1\) sphere is \(π^d/2/Γ(d/2 + 1)\)
- Volume of [-1,1]^d is 2^d
- Acceptance rate is \((π^1/2/2)^d/Γ(d/2 + 1)\)
- Curse of dimensionality

dimension	2	3	4	5	6	7	8	9	10
accept rate (%)	79	52	31	16	8	4	2	1	0

Code for Timing Results

runifSphere <- function(n, dimension, method=c("norm", "cube", "slownorm")) {
    method <- match.arg(method)
    if (method=="norm") {
        u <- matrix(rnorm(n*dimension), ncol=dimension)
        u <- sweep(u, 1, sqrt(apply(u*u, 1, sum)), "/")
    } else if (method=="slownorm") {
        u <- matrix(nrow=n, ncol=dimension)
        for (i in 1:n) {
            x <- rnorm(dimension)
            xnorm <- sqrt(sum(x^2))
            u[i,] <- x/xnorm
        }
    } else {
        u <- matrix(nrow=n, ncol=dimension)
        for (i in 1:n) {
            x <- runif(dimension, -1, 1)
            xnorm <- sqrt(sum(x^2))
            while (xnorm > 1) {
                x <- runif(dimension, -1, 1)
                xnorm <- sqrt(sum(x^2))
            }
            u[i,] <- x/xnorm
        }
    }
    u
}

Easy fix for Borel’s paradox in 3-d

Take longitude \(φ ∼ U(0,2π)\) independent of latitude \(θ = arcsin(2U-1)\), \(U ∼ U(0,1)\).

Rotationally Symmetric Distributions

Comparison of Projected Normal and Langevin Distributions

One way that we might compare the \(\nlangevin(μ, κ)\) and \(\npn(γ\mu, I)\) distributions by choosing κ and γ to give the same mean resultant lengths and comparing the densities of the cosine of the angle θ between \(U\) and \(μ\).

Of course matching mean resultant lengths is not necessarily the best way to compare these families of distributions.

\(d = 2\)

\(d = 3\)

\(d = 4\)

Regression

Gould’s Model

A.k.a., the barber pole model.

Gould’s Model: Likelihood

Calculate the (profile) log-likelihood for Gould (1969 Biometrics) model for simple (single predictor) regression with an intercept. For fixed “slope” β, this function “profiles out” (maximizes over) the “intercept” term and optionally the concentration parameter κ.

loglklhd.gould <- function(beta, theta, x, do.kappa=FALSE) {
    res <- sapply(beta,
                  function(b, th, x) {
                      sqrt(sum(cos(th - b*x))^2
                           + sum(sin(th - b*x))^2)
                  },
                  th=theta, x=x)
    if (do.kappa) {
        n <- length(theta)
        kappa <- sapply(res/n, imrlLvMF, dimen=2)
        res <- n*log(constLvMF(kappa, dimen=2)) + kappa*res
    }
    res
}

Gould’s Model with Equally Spaced X

<<gouldLatticeXData>>
<<gouldPlot>>

Gould’s Model with Equally-Spaced X: Kappa Not Profiled Out

Gould’s Model with Equally-Spaced X: Kappa Profiled Out

Gould’s Model with Random X: Data Generation

alpha <- 0
beta <- 1
kappa = 2.5
x <- rnorm(10)
mu <- as.circular((alpha + beta*x) %% (2*pi))
theta <- as.circular(mu + rvonmises(length(mu), mu=0, kappa=kappa))

Gould’s Model with Random X: Kappa Not Profiled Out

Gould’s Model with Random X: Kappa Profiled Out

Fisher-Lee Model: Likelihood

Calculate the (profile) log-likelihood for the Fisher-Lee (1992 Biometrics) model. For fixed “slope” β, this function “profiles out” (maximizes over) the “intercept” term and optionally the concentration parameter κ. Computing this with biggish matrix multiplies instead of using apply() or looping.

loglklhdFisherLee <- function(beta, theta, X, do.kappa=FALSE) {
    n <- length(theta)
    nbeta <- dim(beta)[2]
    if (dim(X)[1] != n) {
        stop("Number of rows of X must equal length of theta.")
    }
    if (dim(beta)[1] != dim(X)[2]) {
        stop("Number of rows of beta must equal number of columns of X")
    }
    dev <- theta - 2*atan(X %*% beta)
    res <- sqrt(apply(cos(dev), 2, sum)^2
                + apply(sin(dev), 2, sum)^2)
    if (do.kappa) {
        kappa <- sapply(res/n, imrlLvMF, dimen=2)
        res <- n*log(constLvMF(kappa, dimen=2)) + kappa*res
    }
    res
}

Fisher-Lee Model with Random X: Data Generation

Note that Fisher recommends centering the x values before fitting the model. Here, to be certain that the model whose likelihood we plot is equivalent to the data generating model, we will center the x values before generating the responses.

alpha <- 0
beta <- 1
kappa = 2.5
x <- rnorm(10)
x <- x - mean(x)
mu <- as.circular(alpha + 2*atan(beta*x))
theta <- as.circular(mu + rvonmises(length(mu), mu=0, kappa=kappa))

FisherLee’s Model with Random X: Kappa Not Profiled Out

Fisher-Lee Model with Random X: Kappa Profiled Out

Blue Periwinkles

periwinkles <- read.table(datafile("periwinkle.txt"), header=TRUE)

Plot of Periwinkle Data

Fisher-Lee Model Log-Likelihood for Periwinkle Data

Fisher-Lee Model with Two Predictors

SPML Model

Proportional coefficients yield identical directional means with different concentrations.

FilesExpand file tree

directionalData.org

Latest commit

History

directionalData.org

File metadata and controls

Analysis of Directional Data

Introduction

Examples

Wind Directions

Data

Plot

Arrival Times at an ICU

Data

Plot

Primate Vertebrae

Plot of Human Data

Butterfly Migrations

Why is the Analysis of Directional Data Different?

Graphical Display of Directional Data

Graphical Display of Circular Data (in R)

Graphical Display of Circular Data (in R) (ctd)

Graphical Display of Circular Data (in R) (ctd)

Graphical Display of Circular Data (in R) (ctd)

Circular Histograms

Rose Diagrams

Adding a Rose Diagram to the Plot of Wind Directions

Adding a Rose Diagram to the Plot of Wind Directions

Changing the Binwidth

Fewer/Wider Bins

Narrow Bins

Changing the Radii

Changing the Radii

Kernel Density Estimates

Kernel Density Estimates

Spherical Data

Basic Summary Statistics

Mean Direction and Mean Resultant Length

Plot

Aside: Generating from the Uniform Distribution on the Sphere

Generating Random Points on the Sphere

Code for Timing Results

Easy fix for Borel’s paradox in 3-d

Rotationally Symmetric Distributions

Comparison of Projected Normal and Langevin Distributions

\(d = 2\)

\(d = 3\)

\(d = 4\)

Regression

Gould’s Model

Gould’s Model: Likelihood

Gould’s Model with Equally Spaced X

Gould’s Model with Equally-Spaced X: Kappa Not Profiled Out

Gould’s Model with Equally-Spaced X: Kappa Profiled Out

Gould’s Model with Random X: Data Generation

Gould’s Model with Random X: Kappa Not Profiled Out

Gould’s Model with Random X: Kappa Profiled Out

Fisher-Lee Model: Likelihood

Fisher-Lee Model with Random X: Data Generation

FisherLee’s Model with Random X: Kappa Not Profiled Out

Fisher-Lee Model with Random X: Kappa Profiled Out

Blue Periwinkles

Plot of Periwinkle Data

Fisher-Lee Model Log-Likelihood for Periwinkle Data

Fisher-Lee Model Log-Likelihood for Periwinkle Data

Fisher-Lee Model with Two Predictors

SPML Model

SPML Coeficients: lines in 2-space

SPML Coef: mean directions as a function of x.

SPML Coef: mean resultant length as a function of x.