Wish to analyze data in which response is a “direction”:
- 2d directional data are called circular data
- 3d directional data are called spherical data
- not all “directional” data are directions in the usual sense
- “directional” data may also arise in higher dimensions
- Recorded at Col de la Roa, Italian Alps
- n = 310 (first 40 listed below)
- Radians, clockwise from north
- Source: Agostinelli (CSDA 2007); also R package
circular
| 6.23 | 1.03 | 0.15 | 0.72 | 2.20 |
| 0.46 | 0.63 | 1.45 | 0.37 | 1.95 |
| 0.08 | 0.15 | 0.33 | 0.09 | 0.09 |
| 6.23 | 0.05 | 6.14 | 6.28 | 6.17 |
| 6.24 | 6.02 | 6.14 | 6.25 | 0.01 |
| 5.38 | 5.30 | 5.63 | 0.77 | 1.34 |
| 6.14 | 0.22 | 6.23 | 2.33 | 3.61 |
| 0.49 | 6.12 | 0.01 | 0.00 | 0.46 |
- 24-hour clock times (format
hrs.mins) - n = 254 (first 32 listed below)
- Source: Cox & Lewis (1966); also Fisher (1993) and R package
circular
| 11.00 | 17.00 | 23.15 | 10.00 |
| 12.00 | 8.45 | 16.00 | 10.00 |
| 15.30 | 20.20 | 4.00 | 12.00 |
| 2.20 | 12.00 | 5.30 | 7.30 |
| 12.00 | 16.00 | 16.00 | 1.30 |
| 11.05 | 16.00 | 19.00 | 17.45 |
| 20.20 | 21.00 | 12.00 | 12.00 |
| 18.00 | 22.00 | 22.00 | 22.05 |
- Orientation of left superior facet of last lumbar vertebra in humans, gorillas, and chimpanzees
- Source: Keifer (2005 UF Anthropology MA Thesis)
- Direction of travel observed for 2649 migrating butterflies in Florida
- Source: Thomas J Walker, University of Florida, Dept of Entomology and Nematology
- Other variables:
- site: 23 locations in Florida
- observer: Thomas Walker (tw) or James J. Whitesell (jw)
- species: cloudless sulphur (cs), gulf fritillary (gf), long-tailed skipper (lt)
- distance to coast (km)
- date and time of observation
- percentage of sky free of clouds
- quality of sunlight: (b)right, (h)aze, (o)bstructed, (p)artly obstructed
- presence/absence and direction (N, NE, E, SE, S, SW, W, NW) of wind
- temperature
- First three observations from the wind directions data: src_R{paste(round(wind[1:3], 2), collapse=”, “)}
- The mean of these three numbers is
src_R{round(mean(wind[1:3]), 2)} {{{results(
2.47)}}} - What do you think?
- Have already seen simple dot plots for circular data, e.g., for the wind data:
<<windConvert>>
<<windDataPlot>>- and for the ICU data:
<<icuDataPlot>>- and one more …
<<antsDataPlot>>- Circular histograms exist (see Fisher and Mardia and Jupp) but is there a ready-made function in R?
- Invented by Florence Nightingale (elected first female member of the Royals Statistical Society in 1859; honorary member of ASA)
- Nightingale’s rose in R (see also this post and the R graph catalog)
- Note that radii of segments are proportional to square root of the frequencies (counts), so that areas are proportional to frequencies. Is this the right thing to do?
- Rose diagrams suffer from the same problems as histograms. The
impression conveyed may depend strongly on:
- the binwidth of the cells
- the choice of starting point for the bins
rose.diag(windc, bins=16, col="darkgrey",
cex=1.5, prop=1.35, add=TRUE)- I think that the default “radii proportional to counts” is generally best, but this is not always obvious. The scale certainly makes a big difference however.
rose.diag(windc, bins=16, col="darkgrey",
radii.scale="linear",
cex=1.5, prop=2.4, add=TRUE)lines(density.circular(windc, bw=40), lwd=2, lty=1)- Are there any canned routines for plotting spherical data in R?
- First three observations from the wind directions data:
| theta | x | y |
| 6.23 | -0.06 | 1.00 |
| 1.03 | 0.86 | 0.51 |
| 0.15 | 0.15 | 0.99 |
- resultant (sum of direction vectors): (src_R{round(xsum, 3)}, src_R{round(ysum, 3)})
- mean vector: \((\bar{x}, \bar{y}) = \) (src_R{round(xbar, 3)}, src_R{round(ybar, 3)})
- resultant length (Euclidean norm of resultant): R = src_R{round(resultantLength, 3)}
- mean resultant length: \(\bar{R} = \) src_R{round(meanResultantLength, 3)}
- mean direction: \((\bar{x}, \bar{y})/\bar{R} = \) (src_R{round(meanDirection[1], 3)}, src_R{round(meanDirection[2], 3)})
- \(˜{θ} = \) src_R{round(meanDirectionRadians, 3)}
- Wish to generate a random “direction” in d-dimensions; i.e., an observation from the uniform distribution in the \(d-1\) sphere.
- Usual way: let X ∼ N_d(0, I) and return U = X/||X||.
- An alternative rejection sampler:
- Repeat until ||X|| <= 1
- Let X be uniformly distributed on the cube [-1,1]^d
- Return U = X/||X||
- Repeat until ||X|| <= 1
- What is the acceptance rate for the rejection sampler:
- Volume of the \(d - 1\) sphere is \(πd/2/Γ(d/2 + 1)\)
- Volume of [-1,1]^d is 2^d
- Acceptance rate is \((π1/2/2)^d/Γ(d/2 + 1)\)
- Curse of dimensionality
| dimension | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| accept rate (%) | 79 | 52 | 31 | 16 | 8 | 4 | 2 | 1 | 0 |
runifSphere <- function(n, dimension, method=c("norm", "cube", "slownorm")) {
method <- match.arg(method)
if (method=="norm") {
u <- matrix(rnorm(n*dimension), ncol=dimension)
u <- sweep(u, 1, sqrt(apply(u*u, 1, sum)), "/")
} else if (method=="slownorm") {
u <- matrix(nrow=n, ncol=dimension)
for (i in 1:n) {
x <- rnorm(dimension)
xnorm <- sqrt(sum(x^2))
u[i,] <- x/xnorm
}
} else {
u <- matrix(nrow=n, ncol=dimension)
for (i in 1:n) {
x <- runif(dimension, -1, 1)
xnorm <- sqrt(sum(x^2))
while (xnorm > 1) {
x <- runif(dimension, -1, 1)
xnorm <- sqrt(sum(x^2))
}
u[i,] <- x/xnorm
}
}
u
}Take longitude \(φ ∼ U(0,2π)\) independent of latitude \(θ = arcsin(2U-1)\), \(U ∼ U(0,1)\).
One way that we might compare the \(\nlangevin(μ, κ)\) and \(\npn(γ\mu, I)\) distributions by choosing κ and γ to give the same mean resultant lengths and comparing the densities of the cosine of the angle θ between \(U\) and \(μ\).
Of course matching mean resultant lengths is not necessarily the best way to compare these families of distributions.
A.k.a., the barber pole model.
Calculate the (profile) log-likelihood for Gould (1969 Biometrics) model for simple (single predictor) regression with an intercept. For fixed “slope” β, this function “profiles out” (maximizes over) the “intercept” term and optionally the concentration parameter κ.
loglklhd.gould <- function(beta, theta, x, do.kappa=FALSE) {
res <- sapply(beta,
function(b, th, x) {
sqrt(sum(cos(th - b*x))^2
+ sum(sin(th - b*x))^2)
},
th=theta, x=x)
if (do.kappa) {
n <- length(theta)
kappa <- sapply(res/n, imrlLvMF, dimen=2)
res <- n*log(constLvMF(kappa, dimen=2)) + kappa*res
}
res
}<<gouldLatticeXData>>
<<gouldPlot>>alpha <- 0
beta <- 1
kappa = 2.5
x <- rnorm(10)
mu <- as.circular((alpha + beta*x) %% (2*pi))
theta <- as.circular(mu + rvonmises(length(mu), mu=0, kappa=kappa))Calculate the (profile) log-likelihood for the Fisher-Lee (1992
Biometrics) model. For fixed “slope” β, this function “profiles
out” (maximizes over) the “intercept” term and optionally the
concentration parameter κ. Computing this with biggish matrix
multiplies instead of using apply() or looping.
loglklhdFisherLee <- function(beta, theta, X, do.kappa=FALSE) {
n <- length(theta)
nbeta <- dim(beta)[2]
if (dim(X)[1] != n) {
stop("Number of rows of X must equal length of theta.")
}
if (dim(beta)[1] != dim(X)[2]) {
stop("Number of rows of beta must equal number of columns of X")
}
dev <- theta - 2*atan(X %*% beta)
res <- sqrt(apply(cos(dev), 2, sum)^2
+ apply(sin(dev), 2, sum)^2)
if (do.kappa) {
kappa <- sapply(res/n, imrlLvMF, dimen=2)
res <- n*log(constLvMF(kappa, dimen=2)) + kappa*res
}
res
}Note that Fisher recommends centering the x values before fitting the model. Here, to be certain that the model whose likelihood we plot is equivalent to the data generating model, we will center the x values before generating the responses.
alpha <- 0
beta <- 1
kappa = 2.5
x <- rnorm(10)
x <- x - mean(x)
mu <- as.circular(alpha + 2*atan(beta*x))
theta <- as.circular(mu + rvonmises(length(mu), mu=0, kappa=kappa))periwinkles <- read.table(datafile("periwinkle.txt"), header=TRUE)Proportional coefficients yield identical directional means with different concentrations.





















