Skip to content

summary statistics (mean, max, min, etc.) #640

@pdurbin

Description

@pdurbin

As mentioned during the 2024-04-10 task force call, there is interest in providing summary statistics (or more broadly, descriptive statistics) in Croissant format. I'm focusing on summary statistics (mean, max, min, median, mode, standard deviation, etc.) because they are already well defined in the Data Document Initiative (DDI) format.

For example, for a dataset with a variable called "stars" that indicates the number of stars on GitHub, the summary statistics in DDI can be represented like this:

<var ID="v30256083" name="stars" intrvl="discrete">
  <location fileid="f6867331"/>
  <labl level="variable">stars</labl>
  <sumStat type="medn">4.0</sumStat>
  <sumStat type="mean">38.71014492753635</sumStat>
  <sumStat type="mode">.</sumStat>
  <sumStat type="vald">138.0</sumStat>
  <sumStat type="max">732.0</sumStat>
  <sumStat type="invd">0.0</sumStat>
  <sumStat type="min">0.0</sumStat>
  <sumStat type="stdev">110.13079171235681</sumStat>
  <varFormat type="numeric"/>
</var>

The question is, where can I put summary statistics in Croissant?


Update: This issue seems related (mentions statistics):

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions