Skip to content

Latest commit

 

History

History
1165 lines (945 loc) · 33.7 KB

File metadata and controls

1165 lines (945 loc) · 33.7 KB

Analysis of Directional Data

Introduction

Examples

Wish to analyze data in which response is a “direction”:

  • 2d directional data are called circular data
  • 3d directional data are called spherical data
  • not all “directional” data are directions in the usual sense
  • “directional” data may also arise in higher dimensions

Wind Directions

  • Recorded at Col de la Roa, Italian Alps
  • n = 310 (first 40 listed below)
  • Radians, clockwise from north
  • Source: Agostinelli (CSDA 2007); also R package circular

Data

6.231.030.150.722.20
0.460.631.450.371.95
0.080.150.330.090.09
6.230.056.146.286.17
6.246.026.146.250.01
5.385.305.630.771.34
6.140.226.232.333.61
0.496.120.010.000.46

Plot

Plots/wind.png

Arrival Times at an ICU

  • 24-hour clock times (format hrs.mins)
  • n = 254 (first 32 listed below)
  • Source: Cox & Lewis (1966); also Fisher (1993) and R package circular

Data

11.0017.0023.1510.00
12.008.4516.0010.00
15.3020.204.0012.00
2.2012.005.307.30
12.0016.0016.001.30
11.0516.0019.0017.45
20.2021.0012.0012.00
18.0022.0022.0022.05

Plot

Plots/icu.png

Primate Vertebrae

  • Orientation of left superior facet of last lumbar vertebra in humans, gorillas, and chimpanzees
  • Source: Keifer (2005 UF Anthropology MA Thesis)

Pictures/Gray93.png

Plot of Human Data

Pictures/vertebraeOnSphere.png

Butterfly Migrations

  • Direction of travel observed for 2649 migrating butterflies in Florida
  • Source: Thomas J Walker, University of Florida, Dept of Entomology and Nematology
  • Other variables:
    • site: 23 locations in Florida
    • observer: Thomas Walker (tw) or James J. Whitesell (jw)
    • species: cloudless sulphur (cs), gulf fritillary (gf), long-tailed skipper (lt)
    • distance to coast (km)
    • date and time of observation
    • percentage of sky free of clouds
    • quality of sunlight: (b)right, (h)aze, (o)bstructed, (p)artly obstructed
    • presence/absence and direction (N, NE, E, SE, S, SW, W, NW) of wind
    • temperature

Why is the Analysis of Directional Data Different?

  • First three observations from the wind directions data: src_R{paste(round(wind[1:3], 2), collapse=”, “)}
  • The mean of these three numbers is src_R{round(mean(wind[1:3]), 2)} {{{results(2.47)}}}
  • What do you think?

Plots/meanAngle.png

Graphical Display of Directional Data

Graphical Display of Circular Data (in R)

  • Have already seen simple dot plots for circular data, e.g., for the wind data:
<<windConvert>>
<<windDataPlot>>

Graphical Display of Circular Data (in R) (ctd)

  • and for the ICU data:
<<icuDataPlot>>
  • and one more …

Graphical Display of Circular Data (in R) (ctd)

Plots/ants.png

Graphical Display of Circular Data (in R) (ctd)

<<antsDataPlot>>

Circular Histograms

  • Circular histograms exist (see Fisher and Mardia and Jupp) but is there a ready-made function in R?

Rose Diagrams

  • Invented by Florence Nightingale (elected first female member of the Royals Statistical Society in 1859; honorary member of ASA)
  • Nightingale’s rose in R (see also this post and the R graph catalog)
  • Note that radii of segments are proportional to square root of the frequencies (counts), so that areas are proportional to frequencies. Is this the right thing to do?
  • Rose diagrams suffer from the same problems as histograms. The impression conveyed may depend strongly on:
    • the binwidth of the cells
    • the choice of starting point for the bins

Adding a Rose Diagram to the Plot of Wind Directions

rose.diag(windc, bins=16, col="darkgrey",
          cex=1.5, prop=1.35, add=TRUE)

Adding a Rose Diagram to the Plot of Wind Directions

Plots/windRose.png

Changing the Binwidth

Fewer/Wider Bins

Plots/windRoseWide.png

Narrow Bins

Plots/windRoseNarrow.png

Changing the Radii

  • I think that the default “radii proportional to counts” is generally best, but this is not always obvious. The scale certainly makes a big difference however.
rose.diag(windc, bins=16, col="darkgrey",
          radii.scale="linear",
          cex=1.5, prop=2.4, add=TRUE)

Changing the Radii

Plots/windRoseLinear.png

Kernel Density Estimates

lines(density.circular(windc, bw=40), lwd=2, lty=1)

Kernel Density Estimates

Plots/windKdens.png

Spherical Data

  • Are there any canned routines for plotting spherical data in R?

Basic Summary Statistics

Mean Direction and Mean Resultant Length

  • First three observations from the wind directions data:
thetaxy
6.23-0.061.00
1.030.860.51
0.150.150.99
  • resultant (sum of direction vectors): (src_R{round(xsum, 3)}, src_R{round(ysum, 3)})
  • mean vector: \((\bar{x}, \bar{y}) = \) (src_R{round(xbar, 3)}, src_R{round(ybar, 3)})
  • resultant length (Euclidean norm of resultant): R = src_R{round(resultantLength, 3)}
  • mean resultant length: \(\bar{R} = \) src_R{round(meanResultantLength, 3)}
  • mean direction: \((\bar{x}, \bar{y})/\bar{R} = \) (src_R{round(meanDirection[1], 3)}, src_R{round(meanDirection[2], 3)})
  • \(˜{θ} = \) src_R{round(meanDirectionRadians, 3)}

Plot

Plots/meanDirection.png

Aside: Generating from the Uniform Distribution on the Sphere

Generating Random Points on the Sphere

  • Wish to generate a random “direction” in d-dimensions; i.e., an observation from the uniform distribution in the \(d-1\) sphere.
  • Usual way: let X ∼ N_d(0, I) and return U = X/||X||.
  • An alternative rejection sampler:
    • Repeat until ||X|| <= 1
      • Let X be uniformly distributed on the cube [-1,1]^d
    • Return U = X/||X||
  • What is the acceptance rate for the rejection sampler:
    • Volume of the \(d - 1\) sphere is \(πd/2/Γ(d/2 + 1)\)
    • Volume of [-1,1]^d is 2^d
    • Acceptance rate is \((π1/2/2)^d/Γ(d/2 + 1)\)
    • Curse of dimensionality
dimension2345678910
accept rate (%)7952311684210

Code for Timing Results

runifSphere <- function(n, dimension, method=c("norm", "cube", "slownorm")) {
    method <- match.arg(method)
    if (method=="norm") {
        u <- matrix(rnorm(n*dimension), ncol=dimension)
        u <- sweep(u, 1, sqrt(apply(u*u, 1, sum)), "/")
    } else if (method=="slownorm") {
        u <- matrix(nrow=n, ncol=dimension)
        for (i in 1:n) {
            x <- rnorm(dimension)
            xnorm <- sqrt(sum(x^2))
            u[i,] <- x/xnorm
        }
    } else {
        u <- matrix(nrow=n, ncol=dimension)
        for (i in 1:n) {
            x <- runif(dimension, -1, 1)
            xnorm <- sqrt(sum(x^2))
            while (xnorm > 1) {
                x <- runif(dimension, -1, 1)
                xnorm <- sqrt(sum(x^2))
            }
            u[i,] <- x/xnorm
        }
    }
    u
}

Easy fix for Borel’s paradox in 3-d

Take longitude \(φ ∼ U(0,2π)\) independent of latitude \(θ = arcsin(2U-1)\), \(U ∼ U(0,1)\).

Rotationally Symmetric Distributions

Comparison of Projected Normal and Langevin Distributions

One way that we might compare the \(\nlangevin(μ, κ)\) and \(\npn(γ\mu, I)\) distributions by choosing κ and γ to give the same mean resultant lengths and comparing the densities of the cosine of the angle θ between \(U\) and \(μ\).

Of course matching mean resultant lengths is not necessarily the best way to compare these families of distributions.

\(d = 2\)

Plots/PNvLvMF2.png

\(d = 3\)

Plots/PNvLvMF3.png

\(d = 4\)

Plots/PNvLvMF4.png

Regression

Gould’s Model

A.k.a., the barber pole model.

Gould’s Model: Likelihood

Calculate the (profile) log-likelihood for Gould (1969 Biometrics) model for simple (single predictor) regression with an intercept. For fixed “slope” β, this function “profiles out” (maximizes over) the “intercept” term and optionally the concentration parameter κ.

loglklhd.gould <- function(beta, theta, x, do.kappa=FALSE) {
    res <- sapply(beta,
                  function(b, th, x) {
                      sqrt(sum(cos(th - b*x))^2
                           + sum(sin(th - b*x))^2)
                  },
                  th=theta, x=x)
    if (do.kappa) {
        n <- length(theta)
        kappa <- sapply(res/n, imrlLvMF, dimen=2)
        res <- n*log(constLvMF(kappa, dimen=2)) + kappa*res
    }
    res
}

Gould’s Model with Equally Spaced X

<<gouldLatticeXData>>
<<gouldPlot>>

Gould’s Model with Equally-Spaced X: Kappa Not Profiled Out

Plots/gouldLatticeX1.png

Gould’s Model with Equally-Spaced X: Kappa Profiled Out

Plots/gouldLatticeX2.png

Gould’s Model with Random X: Data Generation

alpha <- 0
beta <- 1
kappa = 2.5
x <- rnorm(10)
mu <- as.circular((alpha + beta*x) %% (2*pi))
theta <- as.circular(mu + rvonmises(length(mu), mu=0, kappa=kappa))

Gould’s Model with Random X: Kappa Not Profiled Out

Plots/gouldRandomX1.png

Gould’s Model with Random X: Kappa Profiled Out

Plots/gouldRandomX2.png

Fisher-Lee Model: Likelihood

Calculate the (profile) log-likelihood for the Fisher-Lee (1992 Biometrics) model. For fixed “slope” β, this function “profiles out” (maximizes over) the “intercept” term and optionally the concentration parameter κ. Computing this with biggish matrix multiplies instead of using apply() or looping.

loglklhdFisherLee <- function(beta, theta, X, do.kappa=FALSE) {
    n <- length(theta)
    nbeta <- dim(beta)[2]
    if (dim(X)[1] != n) {
        stop("Number of rows of X must equal length of theta.")
    }
    if (dim(beta)[1] != dim(X)[2]) {
        stop("Number of rows of beta must equal number of columns of X")
    }
    dev <- theta - 2*atan(X %*% beta)
    res <- sqrt(apply(cos(dev), 2, sum)^2
                + apply(sin(dev), 2, sum)^2)
    if (do.kappa) {
        kappa <- sapply(res/n, imrlLvMF, dimen=2)
        res <- n*log(constLvMF(kappa, dimen=2)) + kappa*res
    }
    res
}

Fisher-Lee Model with Random X: Data Generation

Note that Fisher recommends centering the x values before fitting the model. Here, to be certain that the model whose likelihood we plot is equivalent to the data generating model, we will center the x values before generating the responses.

alpha <- 0
beta <- 1
kappa = 2.5
x <- rnorm(10)
x <- x - mean(x)
mu <- as.circular(alpha + 2*atan(beta*x))
theta <- as.circular(mu + rvonmises(length(mu), mu=0, kappa=kappa))

FisherLee’s Model with Random X: Kappa Not Profiled Out

Plots/fisherLeeRandomX1.png

Fisher-Lee Model with Random X: Kappa Profiled Out

Plots/fisherLeeRandomX2.png

Blue Periwinkles

periwinkles <- read.table(datafile("periwinkle.txt"), header=TRUE)

Plot of Periwinkle Data

Fisher-Lee Model Log-Likelihood for Periwinkle Data

Fisher-Lee Model Log-Likelihood for Periwinkle Data

Fisher-Lee Model with Two Predictors

Pictures/messy2.png

SPML Model

Proportional coefficients yield identical directional means with different concentrations.

SPML Coeficients: lines in 2-space

SPML Coef: mean directions as a function of x.

SPML Coef: mean resultant length as a function of x.