There are issues with metric availability for all geography-subsector pairs across time.
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
2772 2863 2877 2894 2930 2964 2967 2927 2888 2942 2821 2922 2918 2968 2882 2875 2892 2934 2969 3703 2967 2939 2939 2943
2014 2015 2016 2017 2018 2019
2983 2995 2419 1860 587 52
We should have 4,428 observations each year.
length( unique( dat.allyears$geo ))
[1] 369
length( unique( dat.allyears$subsector ))
[1] 12
369*12
[1] 4428
The sample code for HHI includes a line to ensure geo and subsector are factors. Perhaps coverage varies by year, though, in which case we should include a global GEO.LEVELS with all metro areas and SUBSECTOR.LEVELS as well. We can add these to utils.R so they are available to all.
dat.hhi <-
df %>%
dplyr::mutate( geo=factor(geo),
subsector=factor(subsector),
resource = as.numeric(resource),
resource=bottomcode(resource) ) %>%
dplyr::group_by( geo, subsector ) %>%
dplyr::summarize( hhi= sum( ( resource / sum(resource))^2 ),
n=dplyr::n(),
{{resource.name}} := sum(resource) )
To use the consistent set of levels we would add a levels argument to factor().
geo=factor( geo, levels=get_geo_levels() ),
subsector=factor( subsector, levels=get_subsector_levels ),
There are issues with metric availability for all geography-subsector pairs across time.
We should have 4,428 observations each year.
The sample code for HHI includes a line to ensure geo and subsector are factors. Perhaps coverage varies by year, though, in which case we should include a global GEO.LEVELS with all metro areas and SUBSECTOR.LEVELS as well. We can add these to utils.R so they are available to all.
To use the consistent set of levels we would add a levels argument to factor().