gdal CLI: surface update intent in JSON usage by brownag · Pull Request #14322 · OSGeo/gdal

brownag · 2026-04-10T16:11:09Z

What does this PR do?

Extends the GDAL algorithm JSON usage schema to include "open_for_update" metadata which describes when datasets are opened in update mode. The schema, as suggested by @rouault in #14290, includes: "by_default" (boolean set flag), "if_any_of" (update conditional on specific arguments), and "unless_any_of" (update suppressed by specific arguments). This set of metadata items handles the diversity of tools with varying default update behavior that can be modified by additional arguments.

Adds builder methods (SetOpenForUpdateIfAnyOf()/UnlessAnyOf()) for the algorithm declaration API. Updates GetUsageAsJSON() to evaluate each algorithm and serialize necessary usage metadata for update intent.

These new conditions apply to several CLI tools, a couple of which were specifically discussed in the original issue, and others were identified while drafting this PR. Includes: raster/vector update (previously missing GDAL_OF_UPDATE), raster edit, raster overview (add/delete/refresh), raster clean collar, vector concat (along with all vector algorithms built on the pipeline framework), and vector rasterize.

New tests validate the serialization of "by_default" and the conditional paths.

EDIT: I want to get feedback on this draft of the idea (i.e. is the proposed JSON schema right, are we capturing everything we need to, are there more algorithms with edge cases I am missing).

What are related issues/pull requests?

#14290

AI tool usage

AI (Gemini, Claude) supported my development of this PR. See our policy about AI tool use. Use of AI tools must be indicated.

Tasklist

Make sure code is correctly formatted (cf pre-commit configuration)
Add test case(s)
Add documentation
Updated Python API documentation (swig/include/python/docs/)
Review
Adjust for comments
All CI builds and checks have passed

Environment

OS: Ubuntu 24.04 LTS
Compiler: GCC 13.3.0

…thods

…rations

rouault · 2026-04-11T13:24:40Z

+    /** Declare that dataset is opened for update if any of these arguments are used.
+     */
+    GDALAlgorithmArgDecl &
+    SetOpenForUpdateIfAnyOf(const std::vector<std::string> &argNames)


I'm not a big fan of adding those new methods in the API and requiring algorithms to use them, which can be easy to forget when new algorithms will be written. I feel like they shouln't be necessary since GDALAlgorithm::ProcessDatasetArg() knows if it must open in update mode or not. Perhaps part of the logic of ProcessDatasetArg() should be moved to an auxiliary method that can be used by both ProcessDataseArg() and GetUsageAsJson().
I might be wrong, but I would like to see some investigation along those lines to be done first

thanks @rouault, I definitely agree that it would be best to not have to repeat this logic both in process arg and in the algorithm declaration.

It was a conscious decision on my part to implement this first draft of the PR without trying to abstract out or modify the existing runtime logic. I recognize the issues with expanding the API surface and additional cognitive overhead for developers, but I would argue that adding these types of conditional update features should not be very common. GDAL_OF_UPDATE signals intent very well, but quite a few of these other tools have complex logic. There is value in being explicit in the algorithm declaration.

All that said, I've been tinkering with some alternative options, so far I have not come up with something I am happy with that seems simpler. But I suspect I can improve on what I have so far. I will need a bit more time. If you can confirm the first 3 commits on this PR are adequate that will help me know I can safely more forward with an alternate implementation that respects the same schema.

rouault · 2026-04-12T22:44:41Z

/me giving up on all AI assisted contributions

brownag · 2026-04-13T01:18:56Z

/me giving up on all AI assisted contributions

If you would rather just close this PR then that is fine by me. I spent a few hours of my Saturday considering ways to refactor in response to your prior comment. I have had a busy weekend though, and did not want to respond before I had fully considered your suggestion. Please feel free to close. I disclosed AI use as is required, but I don't think it is a fair characterization to say I did not give any thought to this work, or that I primarily relied on AI to construct this PR.

ctoney · 2026-04-13T04:40:54Z

Is overwrite handled for datasets? In this PR, "vector rasterize" gets

.SetOpenForUpdateIfAnyOf({"update", "add"});

But if --overwrite is used, then the output file is potentially being mutated conditional on whether it already exists (i.e., an existing file would be deleted and then a new file with the same name opened for update).

brownag · 2026-04-13T05:36:38Z

Is overwrite handled for datasets? In this PR, "vector rasterize" gets
.SetOpenForUpdateIfAnyOf({"update", "add"});
But if --overwrite is used, then the output file is potentially being mutated conditional on whether it already exists (i.e., an existing file would be deleted and then a new file with the same name opened for update).

--overwrite is not handled because it doesn't need update mode opening, as you say it re-creates a new raster file. If I understand the logic in ProcessDatasetArg() properly, with --overwrite set the handling for in-place update will not trigger. Also, if using an arg that opens for update with --overwrite, overwrite takes precedence

But I agree that overwrite is potentially mutating the output. On some level this is semantics of what an "update" is. There are cases where whole algorithms are GDAL_OF_UPDATE, and some cases where GDAL_OF_UPDATE is triggered at runtime, but overwrite is neither of those.

ctoney · 2026-04-13T06:32:35Z

On some level this is semantics of what an "update" is.

I was thinking of it in terms of the additional context given in #14290

In-place modifications break idempotency. Adding this property will help bindings know when a file is being mutated to maintain reproducibility.

I would consider --overwrite as a case of the file being mutated in that context. Otherwise, overwriting a layer in an existing, single-layer GPKG would be "update", but --overwrite the file with a new single-layer would not. Overwriting a single-band raster file with a new single band would not be considered update, when that is basically the raster analog of overwriting a vector layer which is "update". Aren't there cases where the same output could be produced with a choice of algorithm arguments, one considered "update" and the other not, but the choice of which way to do it is more or less arbitrary?

There are cases where whole algorithms are GDAL_OF_UPDATE, and some cases where GDAL_OF_UPDATE is triggered at runtime, but overwrite is neither of those.

I don't understand "overwrite is neither of those"? A file that is overwritten is still opened with update access (at a least a file with same name is, and the previous file no longer existing) and it's triggered at runtime.

brownag · 2026-04-13T08:59:11Z

My goal with #14290 was to identify in the JSON usage which algorithms open datasets for in-place update by default, and also which algorithm arguments modify the default behavior declared for the algorithm. This is metadata that bindings can use to reason about determinism/reproducibility without them having to maintain custom lists and rules.

At JSON usage emission time, we can only track the declarations, not the runtime conditions.

I would consider --overwrite as a case of the file being mutated in that context.

I agree in the general sense of the file system, but I don't think that overwrite should be included in the proposed "open_for_update" "if_any_of" metadata.

I think the distinction is not about whether the overwritten file gets changed, clearly it does change (at least the timestamp). What makes an overwrite operation different from update is that overwrite does not depend on what was there before.

Otherwise, overwriting a layer in an existing, single-layer GPKG would be "update", but --overwrite the file with a new single-layer would not. Overwriting a single-band raster file with a new single band would not be considered update, when that is basically the raster analog of overwriting a vector layer which is "update".

--overwrite-layer opens a dataset with GDAL_OF_UPDATE, and that is semantically different from --overwrite.

Aren't there cases where the same output could be produced with a choice of algorithm arguments, one considered "update" and the other not, but the choice of which way to do it is more or less arbitrary?

Sure but when rendering the JSON usage we can't know how a user is going to run the algorithm. We can know which algorithms or arguments will trigger opening a dataset for update. Also, we can separately know that overwrite will always create a new dataset.

Perhaps we do not need the JSON usage flags to map to the code paths that attach or disable GDAL_OF_UPDATE... but that is what I proposed and attempted to do here.

ctoney · 2026-04-14T04:38:01Z

Thanks. That makes sense to me.

Perhaps we do not need the JSON usage flags to map to the code paths that attach or disable GDAL_OF_UPDATE... but that is what I proposed and attempted to do here.

I wasn't suggesting these flags aren't needed. I asked about the implications of overwrite wrt knowing whether a file is mutated. The PR (in its current form at least) adds incremental complexity and another item for algorithm developers to track and implement consistently. The benefit for determinism/reproducibility should be clear. You articulated well what it adds without having runtime information on whether an existing file is actually modified by overwrite. I see value in that but don't have strong opinion on cost/benefit. Maintainer perspective obviously carries a lot of weight on it, and there are small number of algorithm developers.

brownag · 2026-04-14T15:25:28Z

I see value in that but don't have strong opinion on cost/benefit. Maintainer perspective obviously carries a lot of weight on it, and there are small number of algorithm developers.

Agreed.

I think what @rouault suggested about abstracting out the logic from ProcessDatasetArg() and using it in GetUsageAsJSON() is the most parsimonious approach. However, ProcessDatasetArg() has a lot of logic related to this that gets pretty complex. The scope of things that could break increases quite a bit if I start messing around with the core argument processing routine for what I originally conceived of as improved machine-readable documentation. When I proposed it I don't think I had a full appreciation for how much of this logic was determined at runtime as opposed to just part of the algorithm definition...

As I have looked around I have noticed inconsistencies in algorithm definitions, so I realize that the additional cognitive overhead of adding SetOpenForUpdateIfAnyOf/UnlessAnyOf means that, as implemented, it is likely to be omitted unintentionally as the codebase develops. Though as implemented in this PR omission is not a huge "risk" as only the JSON usage output is affected, not the actual runtime behavior.

I probably should have couched this PR a bit better: I want to get feedback on this relatively low-impact implementation of the idea (i.e. is the proposed JSON schema right, are we capturing everything we need to, are there more algorithms with edge cases I am missing). Your point about overwrite is well taken, so thank you for weighing in.

rouault · 2026-04-14T16:11:57Z

but I don't think it is a fair characterization to say I did not give any thought to this work, or that I primarily relied on AI to construct this PR.

I didn't say or imply that. More that I've had an indigestion of AI contributions over various projects recently that makes me paranoid in general. The submitter knows how much personal thoughts they have put. On my side, I'm in the dark. AI also increases the volume of contributions : maintainers do not scale up.

brownag added 8 commits April 10, 2026 08:31

gdalalg: add missing GDAL_OF_UPDATE flag to raster/vector update

62972f5

test: add open_for_update assertions for raster/vector algorithms

8d26523

schema: add open_for_update field to gdal_algorithm.schema.json

c278c10

gdalalgorithm_cpp: add SetOpenForUpdateIfAnyOf/UnlessAnyOf builder me…

c61b81a

…thods

gdalalgorithm: serialize open_for_update metadata in GetUsageAsJSON

f1e6214

gdalalg_pipeline: set open_for_update conditions in AddVectorOutputArgs

2e7aa41

gdalalg: apply SetOpenForUpdateIfAnyOf for conditional update operations

d62fb8c

gdalalg: apply SetOpenForUpdateUnlessAnyOf for conditional update ope…

f302e24

…rations

rouault reviewed Apr 11, 2026

View reviewed changes

rouault added the AI assisted⚠️ AI assisted coding involved. Review with extreme scepticism. label Apr 12, 2026

brownag marked this pull request as draft April 14, 2026 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gdal CLI: surface update intent in JSON usage#14322

gdal CLI: surface update intent in JSON usage#14322
brownag wants to merge 8 commits intoOSGeo:masterfrom
brownag:of-update-json1

brownag commented Apr 10, 2026 •

edited

Loading

Uh oh!

rouault Apr 11, 2026

Uh oh!

brownag Apr 13, 2026 •

edited

Loading

Uh oh!

rouault commented Apr 12, 2026

Uh oh!

brownag commented Apr 13, 2026

Uh oh!

ctoney commented Apr 13, 2026

Uh oh!

brownag commented Apr 13, 2026

Uh oh!

ctoney commented Apr 13, 2026

Uh oh!

brownag commented Apr 13, 2026

Uh oh!

ctoney commented Apr 14, 2026

Uh oh!

brownag commented Apr 14, 2026 •

edited

Loading

Uh oh!

rouault commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

brownag commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

What are related issues/pull requests?

AI tool usage

Tasklist

Environment

Uh oh!

rouault Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

brownag Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rouault commented Apr 12, 2026

Uh oh!

brownag commented Apr 13, 2026

Uh oh!

ctoney commented Apr 13, 2026

Uh oh!

brownag commented Apr 13, 2026

Uh oh!

ctoney commented Apr 13, 2026

Uh oh!

brownag commented Apr 13, 2026

Uh oh!

ctoney commented Apr 14, 2026

Uh oh!

brownag commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rouault commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brownag commented Apr 10, 2026 •

edited

Loading

brownag Apr 13, 2026 •

edited

Loading

brownag commented Apr 14, 2026 •

edited

Loading