We discussed including the code used to create the data examples in the repository. What this code does might be different depending on the original 'source' of the example (a real dataset, a test dataset from cf-python, created from scratch).
I think it would be nice to have the python code to create each example from scratch. We could present this alongside each dataset as a way to replicate the creation of the example. I don't think it would take too long to make each one if we already have the CDL - LLMs will make this a lot quicker than it would have been a few years ago.
And maybe in the future we could provide code in different languages used by scientists, e.g. R, Fortran, Matlab (if we want to support expensive software!) etc.
Thoughts?
We discussed including the code used to create the data examples in the repository. What this code does might be different depending on the original 'source' of the example (a real dataset, a test dataset from cf-python, created from scratch).
I think it would be nice to have the python code to create each example from scratch. We could present this alongside each dataset as a way to replicate the creation of the example. I don't think it would take too long to make each one if we already have the CDL - LLMs will make this a lot quicker than it would have been a few years ago.
And maybe in the future we could provide code in different languages used by scientists, e.g. R, Fortran, Matlab (if we want to support expensive software!) etc.
Thoughts?