canonical-data.json needs standardisation

Hello,

I maintain [the Factor track](https://github.com/exercism/xfactor), and I'd like to [automate generation of unit tests for exercises](https://github.com/catb0t/exercism.factor/blob/master/exercism/autogen-exercises/autogen-exercises.factor) in my language.

Looking at `exercises/leap/canonical-data.json` it would seem  to be quite simple. However, many of the `canonical-data.json`s don't have a standard set of keys found in `leap`'s json, and this makes it difficult to automate around.

There are, as far as I can tell, two solutions to the problems introduced by the inconsistencies.
- Rather than hardcoding the `description`, `input` and `expected` keys, use a regex / fuzzy find to 
  group keys into description, input and output. The main disadvantages of this are twofold: not 
  only must my code be flimsy, but so must everyone else's, and subject to break on the whims of anyone.
- Standardise on a fixed, predictable set of keys and what their values represent. This makes the jobs of track maintainers easier, simplifies interacting code, and future-proofs the api and the code.

I think standardisation would be greatly beneficial, and if we make an API more accessible, perhaps more tracks will automate generation / regeneration of tests, which would be positive.

But before I open a pull request with structural changes to hundreds of lines of data, I'd like some feedback.

First, is anyone objected to changing the names of the keys? They're rather haphazard (nearly as if
it had been written for humans to read ): ) and some exercises are missing `canonical-data.json` altogether,  and consequently I have difficulty believing there are programs reading this stuff. 
Second, what keys should be used? I'm thinking something like:
- For exercises with one input translating to one output, `description`, `input` and `output`.
- For exercises with multiple inputs / multiple outputs, `description`, `input_N`, `output_N`. 

Note that it would be disadvantageous to use an array for multiple inputs / outputs where an array is not part of the exercise because it would be hard or impossible to tell the difference between multiple inputs and an actual array. We could have keys like `input_multi` which is an array of inputs, I suppose?

Thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

canonical-data.json needs standardisation #376

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

canonical-data.json needs standardisation #376

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions