Skip to content

Fix ObjectStore.get() and upload() swallowing HTTP errors other than …#959

Merged
gmazoyer merged 2 commits intoinfrahub-developfrom
fix/issue-958-object-store-swallows-404
Apr 22, 2026
Merged

Fix ObjectStore.get() and upload() swallowing HTTP errors other than …#959
gmazoyer merged 2 commits intoinfrahub-developfrom
fix/issue-958-object-store-swallows-404

Conversation

@PhillSimonds
Copy link
Copy Markdown
Contributor

Why

ObjectStore.get() and ObjectStore.upload() (both async and sync) silently swallow httpx.HTTPStatusError for any non-2xx status code other than 401/403. The except block was written to convert 401/403 to AuthenticationError but missed the trailing raise that would re-surface other HTTP errors — so for 404, 500, etc. the exception is caught, nothing happens inside the if, and execution falls through to return resp.text / return resp.json(). The caller receives the error response body as if it were the requested content.

The concrete downstream impact surfaced during INFP-504 testing: the new artifact_content Jinja2 filter calls client.object_store.get(identifier=storage_id). When passed a non-existent storage_id, the backend correctly returned HTTP 404 with a GraphQL-shaped error body, but the SDK silently returned that body as a string, which became the artifact's content. The artifact was marked Ready with a corrupt payload (checksum matched the MD5 of the 404 error JSON). A typo'd storage_id in a transform would deploy a broken config to a device with no visible failure upstream.

Goal: make get() and upload() raise on non-2xx non-auth responses, matching the pattern _get_file() already uses.

Non-goals: not adding new typed exceptions, not changing the 401/403 → AuthenticationError behaviour, not touching any other call sites.

Closes #958

What changed

  • Behavioural change users observe: a previously silent failure now raises httpx.HTTPStatusError as expected. Callers that have been relying on the silent-return behaviour (storing error bodies as content) will now correctly fail fast. No legitimate success path changes.
  • Implementation: added a trailing raise at the end of the except httpx.HTTPStatusError block in all four methods (ObjectStore.get, ObjectStore.upload, ObjectStoreSync.get, ObjectStoreSync.upload). The pre-existing _get_file() / sync _get_file already had this pattern — the fix brings the other four into line.
  • Tests added: test_object_store_get_raises_on_404, test_object_store_upload_raises_on_500, test_object_store_get_raises_authentication_error (parametrised over 401/403 to lock in the unchanged behaviour). All parametrised across async and sync clients.
  • What stayed the same: 401/403 still converts to AuthenticationError, 2xx still returns content, ServerNotReachableError still propagates, sync/async parity unchanged, public API unchanged.

How to review

  • infrahub_sdk/object_store.py: four blocks, same one-line change (raise at the end of except httpx.HTTPStatusError). git diff shows six matches of the pattern "raise AuthenticationError(...) from exc\n raise" — four new, two pre-existing in _get_file and sync _get_file that I left untouched.
  • tests/unit/sdk/test_object_store.py: three new parametrised test cases. The 401/403 test locks in the existing auth-conversion behaviour; the 404 and 500 tests would have FAILED on the pre-fix code (silent return of error body).
  • changelog/958.fixed.md: towncrier fragment.

How to test

Unit tests + linters (fast)

uv sync --all-groups --all-extras
uv run ruff check .
uv run ruff format --check .
uv run ty check .
uv run mypy --show-error-codes infrahub_sdk/
uv run pytest tests/unit/sdk/test_object_store.py

All three new tests should pass. To prove they catch the bug, revert one of the four raise statements in object_store.py and re-run: the corresponding test must fail.

End-to-end reproduction (against a live Infrahub 1.9 stack)

Start an Infrahub 1.9 instance (invoke demo.start). Export your token:

export TOKEN=06438eb2-8019-4776-878c-0941b1f1d1ec
export INFRAHUB=http://localhost:8000

Direct SDK reproduction:

import asyncio
from infrahub_sdk import Config, InfrahubClient

async def main() -> None:
    client = InfrahubClient(config=Config(address="http://localhost:8000", api_token="<token>"))
    # Before this PR: returns the 404 JSON error body as a string.
    # After this PR: raises httpx.HTTPStatusError with status 404.
    content = await client.object_store.get(identifier="ffffffff-ffff-ffff-ffff-ffffffffffff")
    print(repr(content))

asyncio.run(main())

End-to-end verification via the artifact_content filter. This is optional if the direct SDK repro above is sufficient, but it exercises the full downstream code path including the worker.

1. Load a minimal schema — a TestingPerson node that can be the target of an artifact:

# /tmp/testing-person.yml
---
version: "1.0"
nodes:
  - name: Person
    namespace: Testing
    include_in_menu: true
    label: Person
    default_filter: name__value
    human_friendly_id: ["name__value"]
    inherit_from: ["CoreArtifactTarget"]
    attributes:
      - name: name
        kind: Text
        unique: true
infrahubctl schema load /tmp/testing-person.yml --wait 30

2. Create a person and add it to a group targeted by the artifact definition:

# Create the person, capturing its id
PERSON_ID=$(curl -s -X POST $INFRAHUB/graphql -H "X-INFRAHUB-KEY: $TOKEN" -H "Content-Type: application/json" \
  -d '{"query": "mutation { TestingPersonCreate(data: {name: {value: \"John Doe\"}}) { object { id } } }"}' \
  | python3 -c 'import json,sys; print(json.load(sys.stdin)["data"]["TestingPersonCreate"]["object"]["id"])')
echo "PERSON_ID=$PERSON_ID"

# Create a group with this person as member
curl -s -X POST $INFRAHUB/graphql -H "X-INFRAHUB-KEY: $TOKEN" -H "Content-Type: application/json" \
  -d "{\"query\": \"mutation { CoreStandardGroupCreate(data: {name: {value: \\\"people\\\"}, members: [{id: \\\"$PERSON_ID\\\"}]}) { object { id } } }\"}"

3. Prepare a small Git repo that uses artifact_content with a storage_id that doesn't exist. Inside a worker container (or any directory Infrahub can reach as a git remote):

mkdir -p /remote/bad-uuid-test/templates
cd /remote/bad-uuid-test

cat > templates/bad_uuid.j2 <<'EOF'
{{ "ffffffff-ffff-ffff-ffff-ffffffffffff" | artifact_content }}
EOF

cat > templates/bad_uuid.gql <<'EOF'
query BadUuid($name: String!) {
  TestingPerson(name__value: $name) { edges { node { name { value } } } }
}
EOF

cat > .infrahub.yml <<'EOF'
---
queries:
  - name: bad_uuid_q
    file_path: templates/bad_uuid.gql
jinja2_transforms:
  - name: bad_uuid_t
    query: bad_uuid_q
    template_path: templates/bad_uuid.j2
artifact_definitions:
  - name: Bad UUID
    artifact_name: bad-uuid
    parameters: { name: name__value }
    content_type: text/plain
    targets: people
    transformation: bad_uuid_t
EOF

git init -q --initial-branch=main && git add -A && git -c user.email=test@test -c user.name=test commit -q -m init

4. Register the repo with Infrahub:

infrahubctl repository add bad-uuid-test file:///remote/bad-uuid-test

Wait ~15 seconds for the artifact pipeline to run.

5. Check the resulting artifact:

curl -s -X POST $INFRAHUB/graphql -H "X-INFRAHUB-KEY: $TOKEN" -H "Content-Type: application/json" \
  -d '{"query":"{ CoreArtifact(name__value: \"bad-uuid\") { edges { node { status { value } storage_id { value } checksum { value } } } } }"}' \
  | python3 -m json.tool

Then fetch the stored content:

# Replace <sid> with the storage_id.value from the previous query
curl -s "$INFRAHUB/api/storage/object/<sid>" -H "X-INFRAHUB-KEY: $TOKEN"

Before this PR:

  • Artifact status.value == Ready.

  • checksum.value == 7f6284048c7bcc3d6acc1697467f382e — the MD5 of the GraphQL error body.

  • Fetched content is the 404 error JSON verbatim:

    {"data":null,"errors":[{"message":"Unable to find the node ffffffff-ffff-ffff-ffff-ffffffffffff / StorageObject in the database.","extensions":{"code":404}}]}
  • No errors in worker logs — the failure is completely silent. A consumer of this artifact would receive the error JSON as its config content with no warning.

After this PR:

  • Artifact status.value == Error.

  • storage_id.value is None.

  • Worker logs (from infp504-task-worker-* or equivalent) contain:

    JinjaFilterError: Filter 'artifact_content': failed to retrieve content for storage_id: ffffffff-ffff-ffff-ffff-ffffffffffff
      — Client error '404 Not Found' for url 'http://server:8000/api/storage/object/ffffffff-ffff-ffff-ffff-ffffffffffff'
    

The fix-behaviour contract is: anything that was going to produce corrupt content now produces a visible failure instead.

Impact & rollout

  • Backward compatibility: behaviour-breaking for any code path that was relying on silent-error-return. In the Infrahub codebase I could not find any legitimate caller that relies on this — the previous behaviour is a bug by every reasonable definition. External SDK users who had written exception handlers expecting get() to always return a string (even for 404) would now need to catch httpx.HTTPStatusError. Release notes should call this out.
  • Performance: none.
  • Config/env changes: none.
  • Deployment notes: safe to deploy. Recommended to pair with a 1.9.x patch release for the Infrahub submodule bump, given this is the fix for a data-quality bug in INFP-504.

Checklist

  • Tests added/updated (three new parametrised test cases: 404 raise, 500 raise, 401/403 auth conversion unchanged)
  • Changelog entry added
  • External docs updated — none needed; the raise-on-error contract is what callers already assumed.
  • Internal .md docs updated — none needed.

…401/403

ObjectStore.get() and ObjectStore.upload() (async + sync) caught
httpx.HTTPStatusError in an except block that only converted 401/403
to AuthenticationError. For any other status code (404, 500, etc.)
the exception was silently dropped and execution fell through to the
return statement — `return resp.text` for get() and `return resp.json()`
for upload(). Callers received the error body as if it were valid
content.

The downstream impact surfaced during INFP-504 testing: the new
`artifact_content` Jinja2 filter calls `object_store.get()`; when the
filter was passed a non-existent storage_id, the backend correctly
returned HTTP 404 with a GraphQL-shaped error body, but the SDK
silently returned that body as a string, which became the artifact's
content. The artifact was marked Ready with a corrupt payload.

Fix: add a trailing `raise` at the end of the
`except httpx.HTTPStatusError` block in all four places, matching the
pattern already used in `_get_file()` (lines 99 and 185). Non-401/403
HTTP errors now propagate to callers.

Added unit tests covering:
- get() raises httpx.HTTPStatusError on 404
- upload() raises httpx.HTTPStatusError on 500
- get() still converts 401/403 to AuthenticationError (unchanged)
Both parametrised for async and sync clients.

Fixes #958
@PhillSimonds PhillSimonds requested a review from a team as a code owner April 22, 2026 00:53
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@                 Coverage Diff                  @@
##           infrahub-develop     #959      +/-   ##
====================================================
+ Coverage             81.11%   81.38%   +0.27%     
====================================================
  Files                   134      134              
  Lines                 11314    11324      +10     
  Branches               1693     1693              
====================================================
+ Hits                   9177     9216      +39     
+ Misses                 1594     1566      -28     
+ Partials                543      542       -1     
Flag Coverage Δ
integration-tests 41.81% <0.00%> (-0.03%) ⬇️
python-3.10 54.32% <100.00%> (+0.29%) ⬆️
python-3.11 54.30% <100.00%> (+0.27%) ⬆️
python-3.12 54.32% <100.00%> (+0.31%) ⬆️
python-3.13 54.32% <100.00%> (+0.29%) ⬆️
python-3.14 55.91% <100.00%> (+0.29%) ⬆️
python-filler-3.12 22.77% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
infrahub_sdk/object_store.py 71.24% <100.00%> (+12.85%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread changelog/958.fixed.md Outdated
@@ -0,0 +1 @@
Fixed `ObjectStore.get()` and `ObjectStore.upload()` (async + sync) silently swallowing `httpx.HTTPStatusError` for HTTP status codes other than 401/403. The except block only converted 401/403 to `AuthenticationError` and fell through for all other non-2xx responses, causing `get()` to return the error body as string content and `upload()` to return the error payload as if it were an upload response. Downstream effect on INFP-504 artifact composition: `artifact_content` filter silently stored GraphQL-shaped 404 bodies as artifact content for missing storage IDs — a typo'd storage_id would deploy a broken config with no visible failure. Added a trailing `raise` to re-raise non-401/403 HTTP errors, matching the pattern already used in `_get_file()`. Added unit tests covering 404 on `get()`, 500 on `upload()`, and unchanged 401/403 → AuthenticationError conversion.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how useful this changelog entry would be to end user. It mentions quite a bit the internals, which is useless IMO, but it also mention a Jira task which is not public and therefore should not be in this.

Comment thread tests/unit/sdk/test_object_store.py Outdated

@pytest.mark.parametrize("client_type", client_types)
async def test_object_store_get_raises_on_404(client_type: str, clients: BothClients, httpx_mock: HTTPXMock) -> None:
"""get() must raise on 404 — otherwise the response body is silently returned as 'content'."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self explaining test

Comment thread tests/unit/sdk/test_object_store.py Outdated
async def test_object_store_get_raises_authentication_error(
client_type: str, status_code: int, clients: BothClients, httpx_mock: HTTPXMock
) -> None:
"""get() must still convert 401/403 responses to AuthenticationError (unchanged behaviour)."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self explaining test

Comment thread tests/unit/sdk/test_object_store.py Outdated

@pytest.mark.parametrize("client_type", client_types)
async def test_object_store_upload_raises_on_500(client_type: str, clients: BothClients, httpx_mock: HTTPXMock) -> None:
"""upload() must raise on server errors — otherwise resp.json() returns the error payload as if it were a successful upload."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self explaining test

Simplify changelog entry to be user-facing and remove verbose test
docstrings on self-explanatory tests.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 22, 2026

Deploying infrahub-sdk-python with  Cloudflare Pages  Cloudflare Pages

Latest commit: c265b84
Status: ✅  Deploy successful!
Preview URL: https://1dd28359.infrahub-sdk-python.pages.dev
Branch Preview URL: https://fix-issue-958-object-store-s.infrahub-sdk-python.pages.dev

View logs

@gmazoyer gmazoyer force-pushed the fix/issue-958-object-store-swallows-404 branch from 0590e20 to c265b84 Compare April 22, 2026 09:08
@gmazoyer gmazoyer merged commit 2069066 into infrahub-develop Apr 22, 2026
20 checks passed
@gmazoyer gmazoyer deleted the fix/issue-958-object-store-swallows-404 branch April 22, 2026 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: ObjectStore.get() and upload() swallow HTTP errors other than 401/403

2 participants