Skip to content

Latest commit

 

History

History
545 lines (418 loc) · 15.1 KB

File metadata and controls

545 lines (418 loc) · 15.1 KB

Introduction

Examples

Wish to analyze data in which response is a "direction":

  • 2d directional data are called circular data
  • 3d directional data are called spherical data
  • not all "directional" data are directions in the usual sense
  • "directional" data may also arise in higher dimensions

Wind Directions

  • Recorded at Col de la Roa, Italian Alps
  • n = 310 (first 40 listed below)
  • Radians, clockwise from north
  • Source: Agostinelli (CSDA 2007); also R package circular

Data

ascii(matrix(wind[1:40], ncol=5, byrow=TRUE), digits=2,
      include.rownames=FALSE, include.colnames=FALSE)

Plot

require("circular")
par(mar=c(0,0,0,0)+0.1, oma=c(0,0,0,0)+0.1)
plot(windc, cex=1.5, axes=FALSE,
     bin=360, stack=TRUE, sep=0.035, shrink=1.3)
axis.circular(at=circular(seq(0, (7/4)*pi, pi/4),
                  template="geographics"),
              labels=c("N","NE","E","SE","S","SW","W","NW"),
              cex=1.4)
ticks.circular(circular(seq(0, (15/8)*pi, pi/8)),
               zero=pi/2, rotation="clock",
               tcl=0.075)

Arrival Times at an ICU

  • 24-hour clock times (format hrs.mins)
  • n = 254 (first 32 listed below)
  • Source: Cox & Lewis (1966); also Fisher (1993) and R package circular

Data

ascii(matrix(fisherB1[1:32], ncol=4, byrow=TRUE), digits=2,
      include.rownames=FALSE, include.colnames=FALSE)

Plot

## Note that pch=17 does not work properly here.
par(mar=c(0,0,0,0)+0.1, oma=c(0,0,0,0)+0.1)
plot(fisherB1c, cex=1.5, axes=TRUE,
     bin=360, stack=TRUE, sep=0.035, shrink=1.3)

Primate Vertebrae

  • Orientation of left superior facet of last lumbar vertebra in humans, gorillas, and chimpanzees
  • Source: Keifer (2005 UF Anthropology MA Thesis)

as superior articulate process.

Plot of Human Data

chimpanzees (red), 16 gorillas (green) and 19 humans (blue).

Butterfly Migrations

  • Direction of travel observed for 2649 migrating butterflies in Florida
  • Source: Thomas J Walker, University of Florida, Dept of Entomology and Nematology
  • Other variables:
    • site: 23 locations in Florida
    • observer: Thomas Walker (tw) or James J. Whitesell (jw)
    • species: cloudless sulphur (cs), gulf fritillary (gf), long-tailed skipper (lt)
    • distance to coast (km)
    • date and time of observation
    • percentage of sky free of clouds
    • quality of sunlight: (b)right, (h)aze, (o)bstructed, (p)artly obstructed
    • presence/absence and direction (N, NE, E, SE, S, SW, W, NW) of wind
    • temperature

Why is the Analysis of Directional Data Different?

  • First three observations from the wind directions data: paste(round(wind[1:3], 2), collapse=", "){.r .rundoc-block rundoc-language="R"}
  • The mean of these three numbers is round(mean(wind[1:3]), 2){.r .rundoc-block rundoc-language="R"} {{{results(2.47)}}}
  • What do you think?
par(mar=c(0,0,0,0)+0.1, oma=c(0,0,0,0)+0.1)
plot(windc[1:3], cex=2, lwd=1.5, axes=TRUE, ticks=TRUE, tcl=0.05)
points(circular(mean(wind[1:3]), units="radians", template="geographics"),
       pch=8, cex=4) 

Graphical Display of Directional Data

Graphical Display of Circular Data (in R)

  • Have already seen simple dot plots for circular data, e.g., for the wind data:
<<windConvert>>
<<windDataPlot>>

Graphical Display of Circular Data (in R) (ctd)

  • and for the ICU data:
<<icuDataPlot>>
  • and one more ...

Graphical Display of Circular Data (in R) (ctd)

par(mar=c(0,0,0,0)+0.1, oma=c(0,0,0,0)+0.1)
plot(fisherB10c$set1, units="degrees", zero=pi/2,
     rotation="clock", pch=16, cex=1.5)
ticks.circular(circular(seq(0, (11/6)*pi, pi/6)),
               zero=pi/2, rotation="clock", tcl=0.075)
points(fisherB10c$set2, zero=pi/2,
       rotation="clock", pch=16, col="darkgrey",
       next.points=-0.1, cex=1.5)
points(fisherB10c$set3, zero=pi/2,
       rotation="clock", pch=1,
       next.points=0.1, cex=1.5)

three different experimental conditions:

Graphical Display of Circular Data (in R) (ctd)

<<antsDataPlot>>

Circular Histograms

  • Circular histograms exist (see Fisher and Mardia and Jupp) but is there a ready-made function in R?

Rose Diagrams

  • Invented by Florence Nightingale (elected first female member of the Royals Statistical Society in 1859; honorary member of ASA)
  • Nightingale's rose in R (see also this post and the R graph catalog)
  • Note that radii of segments are proportional to square root of the frequencies (counts), so that areas are proportional to frequencies. Is this the right thing to do?
  • Rose diagrams suffer from the same problems as histograms. The impression conveyed may depend strongly on:
    • the binwidth of the cells
    • the choice of starting point for the bins

Adding a Rose Diagram to the Plot of Wind Directions

rose.diag(windc, bins=16, col="darkgrey",
          cex=1.5, prop=1.35, add=TRUE)

Adding a Rose Diagram to the Plot of Wind Directions

<<windDataPlot>>
<<windRosePart>>

(segment radii are proportional to square roots of counts).

Changing the Binwidth

Fewer/Wider Bins

<<windDataPlot>>
<<windRoseWideBinsPart>>

Narrow Bins

<<windDataPlot>>
<<windRoseNarrowBinsPart>>

Changing the Radii

  • I think that the default "radii proportional to counts" is generally best, but this is not always obvious. The scale certainly makes a big difference however.
rose.diag(windc, bins=16, col="darkgrey",
          radii.scale="linear",
          cex=1.5, prop=2.4, add=TRUE)

Changing the Radii

<<windDataPlot>>
<<windRoseLinearPart>>

(segment radii proportional to counts).

Kernel Density Estimates

lines(density.circular(windc, bw=40), lwd=2, lty=1)

Kernel Density Estimates

par(mar=c(0,0,0,0)+0.1, oma=c(0,0,0,0)+0.1)
plot(windc, cex=1.5, axes=FALSE,
     bin=360, stack=TRUE, sep=0.035, shrink=1.7)
axis.circular(at=circular(seq(0, (7/4)*pi, pi/4),
                  template="geographics"),
              labels=c("N","NE","E","SE","S","SW","W","NW"),
              cex=1.4)
ticks.circular(circular(seq(0, (15/8)*pi, pi/8)),
               ## zero=pi/2, rotation="clock",
               tcl=0.075)
<<windRosePart>>
<<windKdensPart>>

and kernel density estimate.

Spherical Data

  • Are there any canned routines for plotting spherical data in R?

Basic Summary Statistics

Mean Direction and Mean Resultant Length

  • First three observations from the wind directions data:
theta <- wind[1:3]
x <- sin(theta)
y <- cos(theta)
ascii(cbind(theta, x, y), digits=2,
      include.rownames=FALSE, include.colnames=TRUE)
  • resultant (sum of direction vectors): (round(xsum, 3){.r .rundoc-block rundoc-language="R"}, round(ysum, 3){.r .rundoc-block rundoc-language="R"})

  • mean vector: $(\bar{x}, \bar{y}) = $ (round(xbar, 3){.r .rundoc-block rundoc-language="R"}, round(ybar, 3){.r .rundoc-block rundoc-language="R"})

  • resultant length (Euclidean norm of resultant): R = round(resultantLength, 3){.r .rundoc-block rundoc-language="R"}

  • mean resultant length: $\bar{R} = $ round(meanResultantLength, 3){.r .rundoc-block rundoc-language="R"}

  • mean direction: $(\bar{x}, \bar{y})/\bar{R} = $ (round(meanDirection[1], 3){.r .rundoc-block rundoc-language="R"}, round(meanDirection[2], 3){.r .rundoc-block rundoc-language="R"})

  • $\tilde{\theta} = $ round(meanDirectionRadians, 3){.r .rundoc-block rundoc-language="R"}

Plot

par(mar=c(0,0,0,0)+0.1, oma=c(0,0,0,0)+0.1)
plot(windc[1:3], cex=2, lwd=1.5, axes=TRUE, ticks=TRUE, tcl=0.05)
points(circular(meanDirectionRadians, units="radians", template="geographics"),
       pch=8, cex=4) 

and their sample mean direction.

Aside: Generating from the Uniform Distribution on the Sphere

Generating Random Points on the Sphere

  • Wish to generate a random "direction" in d-dimensions; i.e., an observation from the uniform distribution in the $d-1$ sphere.
  • Usual way: let X  ∼ Nd(0, I) and return U = X/||X||.
  • An alternative rejection sampler:
    • Repeat until ||X|| <= 1
      • Let X be uniformly distributed on the cube [-1,1]^d^
    • Return U = X/||X||
  • What is the acceptance rate for the rejection sampler:
    • Volume of the $d - 1$ sphere is $\pi^{d/2}/\Gamma(d/2 + 1)$
    • Volume of [-1,1]^d^ is 2^d^
    • Acceptance rate is $(\pi^{1/2}/2)^d/\Gamma(d/2 + 1)$
    • Curse of dimensionality
accRate <- function(d) ((sqrt(pi)/2)^d)/gamma(d/2 + 1)
d <- 2:10
## ar <- matrix(accRate(d), nrow=1,
##              dimnames=list("accept rate", "d"=d))
ar <- rbind("dimension"=d, "accept rate (%)"= 100*accRate(d))
ascii(ar, digits=0, include.rownames=TRUE, include.colnames=FALSE)

Code for Timing Results

runifSphere <- function(n, dimension, method=c("norm", "cube", "slownorm")) {
    method <- match.arg(method)
    if (method=="norm") {
        u <- matrix(rnorm(n*dimension), ncol=dimension)
        u <- sweep(u, 1, sqrt(apply(u*u, 1, sum)), "/")
    } else if (method=="slownorm") {
        u <- matrix(nrow=n, ncol=dimension)
        for (i in 1:n) {
            x <- rnorm(dimension)
            xnorm <- sqrt(sum(x^2))
            u[i,] <- x/xnorm
        }
    } else {
        u <- matrix(nrow=n, ncol=dimension)
        for (i in 1:n) {
            x <- runif(dimension, -1, 1)
            xnorm <- sqrt(sum(x^2))
            while (xnorm > 1) {
                x <- runif(dimension, -1, 1)
                xnorm <- sqrt(sum(x^2))
            }
            u[i,] <- x/xnorm
        }
    }
    u
}

Easy fix for Borel's paradox in 3-d

Take longitude $\phi \sim U(0,2\pi)$ independent of latitude $\theta = \arcsin(2U-1)$, $U \sim U(0,1)$.

Rotationally Symmetric Distributions

Comparison of Projected Normal and Langevin Distributions

One way that we might compare the $\nlangevin(\mu, \kappa)$ and $\npn(\gamma\mu, I)$ distributions by choosing κand γto give the same mean resultant lengths and comparing the densities of the cosine of the angle θbetween $U$ and $\mu$.

Of course matching mean resultant lengths is not necessarily the best way to compare these families of distributions.

$d = 2$

par(mar=c(2,2,0,0)+0.1, oma=c(0,0,0,0)+0.1)
plotPNvLvMF(2)

$d = 3$

par(mar=c(2,2,0,0)+0.1, oma=c(0,0,0,0)+0.1)
plotPNvLvMF(3)

$d = 4$

par(mar=c(2,2,0,0)+0.1, oma=c(0,0,0,0)+0.1)
plotPNvLvMF(4)

Regression

Gould's Model

A.k.a., the barber pole model.

Gould's Model: Likelihood

Calculate the (profile) log-likelihood for Gould (1969 Biometrics) model for simple (single predictor) regression with an intercept. For fixed "slope" β, this function "profiles out" (maximizes over) the "intercept" term and optionally the concentration parameter κ.

loglklhd.gould <- function(beta, theta, x, do.kappa=FALSE) {
    res <- sapply(beta,
                  function(b, th, x) {
                      sqrt(sum(cos(th - b*x))^2
                           + sum(sin(th - b*x))^2)
                  },
                  th=theta, x=x)
    if (do.kappa) {
        n <- length(theta)
        kappa <- sapply(res/n, imrlLvMF, dimen=2)
        res <- n*log(constLvMF(kappa, dimen=2)) + kappa*res
    }
    res
}

Gould's Model with Equally Spaced X

<<gouldLatticeXData>>
<<gouldPlot>>
<<gouldPlot>>

Gould's Model with Equally-Spaced X: Kappa Not Profiled Out

κnot profiled out.

Gould's Model with Equally-Spaced X: Kappa Profiled Out

κprofiled out.

Gould's Model with Random X: Data Generation

alpha <- 0
beta <- 1
kappa = 2.5
x <- rnorm(10)
mu <- as.circular((alpha + beta*x) %% (2*pi))
theta <- as.circular(mu + rvonmises(length(mu), mu=0, kappa=kappa))
<<gouldPlot>>
<<gouldPlot>>

Gould's Model with Random X: Kappa Not Profiled Out

κnot profiled out.

Gould's Model with Random X: Kappa Profiled Out

κprofiled out.