MaddisonData/README.Rmd at main · sbgraves237/MaddisonData · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# MaddisonData

<!-- badges: start -->
[![`R-CMD-check`](https://github.com/sbgraves237/MaddisonData/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/sbgraves237/MaddisonData/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

Make it easier for humans to access data from the Maddison Data Project in R.
Later releases may include vignettes, etc., documenting analyses using the
[`KFAS`] (Kalman filtering and smoothing, aka state space) techniques with
these data.

Objectives: Make it relatively easy in R to do the following:

1. Find the countries with the highest `gdppc` for each year for which data are available using `MaddisonLeaders()`. Allow deleting companies with high `gdppc` based on something narrow like a commodity, e.g., oil. [DONE]

2. Support modeling a time series as `c('level', 'growthRate')`.

2.1. Using the `KFAS` package with functions `growthModel` and `growthUpdateFn` plus plotting with `ggplotPath` and `ggplotPath2` as documented in vignette `KalmanSmoothing`. [ALMOST DONE, but `KFAS::fitSSM` seems not to pass optional arguments to `optim` and strips dimnames from components of an `SSModel`. This means that with an irregular time series, e.g., data missing for some years, the gaps are ignored, and esimation assumes the data are contiguous. This impacts especially the state transition matrix `T` and transition variance `Q`. This should be a minor problem problem for series with only a few missing values and will be ignored until convenient to fix.]

2.2. Ask John Nash for his preferred substitute for `optim`. Also ask him about supporting an optional `fixPar` vector = NA for `par` to fix and the numeric values for others. This could support easy testing testing of submodels.

2.3. Ask Jouni Helske and the other `KFAS` and `bssm` contributors about supporting
(a) dimnames and
(b) an optional `Time` and `dT` component(s)) of an `SSModel` plus
(c) a function `asTime` = `Time` component of a model else `as.Posixct(names(y))` else `as.Date(names(y))` else `as.numeric(names(y))` else [`ordered(names(y))` with `dT = `1:length(y)`] with `dT_ = diff(Time)` and `dT = c(dT[1], dT)` and
(d) for `bssm` assuming Student's t migrations of `growthRate`.

```{r MadDat1600}
library(MaddisonData)
MadDat1600 <- subset(MaddisonData, year>1600)
Leaders1600 <- MaddisonLeaders(c('ARE', 'KWT', 'QAT'), data=MadDat1600)
summary(Leaders1600)
```

3. Plot the data available on `gdppc` and / or pop for a selection of countries,
e.g., world leaders.

```{r ggplotPath}
str(GBR_USA <- subset(MaddisonData::MaddisonData, ISO %in% c('GBR', 'USA')))
GBR_USA1 <- MaddisonData::ggplotPath('year', 'gdppc', 'ISO', GBR_USA, 1000)

GBR_USA1+ggplot2::coord_cartesian(xlim=c(1500, 1850)) # for only 1500-1850
GBR_USA1+ggplot2::coord_cartesian(xlim=c(1600, 1700), ylim=c(7, 17))

# label the lines
ISOll <- data.frame(x=c(1500, 1800), y=c(2.5, 1.7), label=c('GBR', 'USA'),
              srt=c(0, 30), col=c('red', 'green'), size=c(2, 9))
GBR_USA2 <- ggplotPath('year', 'gdppc', 'ISO', GBR_USA, 1000,
                    labels=ISOll, fontsize = 20)

# h, vlines, manual legend only
Hlines <- c(1,3, 10, 30)
Vlines = c(1849, 1929, 1933, 1939, 1945)
(GBR_USA3 <- ggplotPath('year', 'gdppc', 'ISO', GBR_USA, 1000,
       ylab='GDP per capita (2011 PPP K$)',
       legend.position = NULL, hlines=Hlines, vlines=Vlines, labels=ISOll))
```

LATER:

4. Build a state space / Kalman models for `gdppc` and `pop` for each country
in the Maddison project data.

5. Use Kalman smooth to interpolate and extrapolate (forward but not
backwards) `gdppc` and `pop` for each country for all years that appear
anywhere in the Maddison project data.

6. Identify the world leader in `gdppc` for each year, refining "1" using
`KFAS` interpolation.

7. Identify the world technology leader for each year by evaluating the
`gdppc` leader for each year and replacing any whose leadership was narrow
like members of OPEC with a country with a broad-based economy like the US.

## Installation

You can install the development version of MaddisonData from
[GitHub](https://github.com/) with:

``` r
# install.packages("pak")
pak::pak("sbgraves237/MaddisonData")
```

## Example

[Coming soon.]
<!--This is a basic example which shows you how to solve a common problem:-->

```{r example}
library(MaddisonData)
## basic example code
```