Skip to content

canonical-data.json needs standardisation #376

@catb0t

Description

@catb0t

Hello,

I maintain the Factor track, and I'd like to automate generation of unit tests for exercises in my language.

Looking at exercises/leap/canonical-data.json it would seem to be quite simple. However, many of the canonical-data.jsons don't have a standard set of keys found in leap's json, and this makes it difficult to automate around.

There are, as far as I can tell, two solutions to the problems introduced by the inconsistencies.

  • Rather than hardcoding the description, input and expected keys, use a regex / fuzzy find to
    group keys into description, input and output. The main disadvantages of this are twofold: not
    only must my code be flimsy, but so must everyone else's, and subject to break on the whims of anyone.
  • Standardise on a fixed, predictable set of keys and what their values represent. This makes the jobs of track maintainers easier, simplifies interacting code, and future-proofs the api and the code.

I think standardisation would be greatly beneficial, and if we make an API more accessible, perhaps more tracks will automate generation / regeneration of tests, which would be positive.

But before I open a pull request with structural changes to hundreds of lines of data, I'd like some feedback.

First, is anyone objected to changing the names of the keys? They're rather haphazard (nearly as if
it had been written for humans to read ): ) and some exercises are missing canonical-data.json altogether, and consequently I have difficulty believing there are programs reading this stuff.
Second, what keys should be used? I'm thinking something like:

  • For exercises with one input translating to one output, description, input and output.
  • For exercises with multiple inputs / multiple outputs, description, input_N, output_N.

Note that it would be disadvantageous to use an array for multiple inputs / outputs where an array is not part of the exercise because it would be hard or impossible to tell the difference between multiple inputs and an actual array. We could have keys like input_multi which is an array of inputs, I suppose?

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions