Skip to content

balance dependency pinning against flexibility for conda-forge builds of fremor #162

@ilaflott

Description

@ilaflott

dependency pinning has always been a funny thing to me because it seems that people take one of two flawed approaches. To be up-front, i'm a type-2.

too strict: "glass menagerie" of exact-pin builds

for each built published package, include requirements.txt showing the output of e.g. conda list in a built+activated environment. this lists all packages in the build environment with their exact version and build.

the upside- you can explain the need and motivation to a beginner, feeling like it intuitively makes sense. it feels like a warm blanket of security...

...until the very strong downside of almost-absent flexibility for builds reveals it's actually a strait-jacket. want to update something? time to interact with dependencies-of-dependencies that one definitely has never heard of nor cares about. a user is only able to install such a build if their own environment is compatible with every. dang. thing. in. that. list. literal 0 flexibility.

the next step to compensate in this extreme is to create large number of unique builds with exactly pinned dependencies. it won't work! now we have N different builds for v X.Y.Z, each with their own set of M restrictions to match exactly, and breaks when they are not. Hence, "glass menagerie".

too flexible: "surprise!" noarch builds

in this approach we correctly assume the environment resolver exists for good reasons, but overly rely on it across all settings: developer, user, and package channel. no versions in the reqs are capped so we are aware of every possible update as soon as a CI runs.

the upside is flexibility and "heads-up" structurally provided by having to resolve the environment everytime- if it doesn't resolve, something changed! that's a very strong and reliable signal to "dig-in" for developers. users benefit too, they experience easy installations across a wide variety of potential environments.

the downside is when something changes upstream, and neither the CI nor a developer working with the unpinned builds/code has found the problem yet. without providing a ceiling to certain major packages, thing may break without warning on the user end, and they will not have much to go on to figure out what went wrong.

i.e., the versioned/packaged build breaks, saying, "surprise!" And the user confusingly is left there saying, "this worked yesterday...". for versioned packages we strive to bring to a high level of professional quality, this will not remain acceptable over time! Even if we are getting away with it right now.

just right: "goldilocks" noarch with ranged dependency pins

a balanced approach in the middle would accept that we keep a single, universal noarch build on the conda-forge channel, but we set safe upper AND lower boundaries on the most important/fragile dependencies the package relies on, letting others "float" un-pinned in the build for both users and developer's sanity sake. the environment.yaml should probably remain a super-set of the last versioned release, i.e. un-bound for the CI, where something can safely tank and let us know that's the case.

example to illustrate the point, fremor is very dependent on netcdf4 and cmor, so we could make a cmor>=3.15.0,<4.0.0 + netcdf4>=1.7,<2.0 build, leaving all other requirements the same. we could scan over all minor-versioned combos of the two for numpy=2.* and then again for numpy=1.*.

upside: we get the best of both of the above two extremes.

downside: it takes considered and targeted effort to figure out which reqs should be bounded, and how wide/narrow those safe version ranges should be.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions