Skip to content

refactor: restrict OpenAPI $ref resolution to in-document pointers#11226

Closed
etairl wants to merge 2 commits into
deepset-ai:mainfrom
etairl:harden-openapi-ref-resolution
Closed

refactor: restrict OpenAPI $ref resolution to in-document pointers#11226
etairl wants to merge 2 commits into
deepset-ai:mainfrom
etairl:harden-openapi-ref-resolution

Conversation

@etairl

@etairl etairl commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Summary

  • OpenAPIServiceToFunctions._parse_openapi_spec was calling jsonref.replace_refs with the library default loader, which dispatches arbitrary $ref URIs to the filesystem (file://) and the network (http(s)://).
  • Pass an explicit loader that refuses any non-in-document $ref and set proxies=False so references must resolve eagerly during parsing.
  • In-document JSON-pointer references (those starting with #) continue to work unchanged.

This is a defensive hardening change for OpenAPI specs whose contents are not fully trusted (e.g. fetched from third-party catalogs, generated by an LLM, or uploaded by end users). Specs that depend on external $ref resolution will now raise a RuntimeError instead of silently performing filesystem reads or outbound HTTP requests.

Test plan

  • CI runs OpenAPIServiceToFunctions unit tests with the new loader.
  • Manual: a spec containing an in-document $ref (e.g. #/components/schemas/Foo) still resolves correctly.
  • Manual: a spec containing $ref: file:///etc/passwd or $ref: http://... raises and does not perform the I/O.

🤖 Generated with Claude Code

OpenAPIServiceToFunctions._parse_openapi_spec previously called
jsonref.replace_refs with the library default loader, which dispatches
arbitrary $ref URIs to the filesystem and the network. Pass an explicit
loader that rejects any non-in-document reference and disable proxies so
references must be resolved eagerly. In-document JSON-pointer refs
(those starting with "#") are unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@etairl etairl requested a review from a team as a code owner April 30, 2026 10:20
@etairl etairl requested review from anakin87 and removed request for a team April 30, 2026 10:20
@vercel

vercel Bot commented Apr 30, 2026

Copy link
Copy Markdown

@etairl is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@anakin87 anakin87 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @etairl!

Could you better explain why this is needed?
Please also read our security policy before. If you have identified a vulnerability, please send us the details confidentially via email.

I also noticed that you opened multiple PRs. I'd suggest focusing on one at a time.

@etairl

etairl commented Apr 30, 2026

Copy link
Copy Markdown
Contributor Author

Hello, @etairl!

Could you better explain why this is needed? Please also read our security policy before. If you have identified a vulnerability, please send us the details confidentially via email.

I also noticed that you opened multiple PRs. I'd suggest focusing on one at a time.

Hi @anakin87,

Sorry about the spam (they've been sitting in my todos for a few days, created the PRs today in one batch). So on one hand this probably falls under out of scope category ("attacker-controlled input to Haystack is considered out of scope"). The reason it still should be a PR is that users expect "parse this spec" to just parse - not to read files or hit the network, so the default should be safer. This is security hardening rather than vulnerability patching.

@anakin87

Copy link
Copy Markdown
Member

Thank you! I'll give this a better look later...

Release-note linter rejects single backticks for inline code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/components/converters
  openapi_functions.py 241, 268
Project Total  

This report was generated by python-coverage-comment-action

@CLAassistant

CLAassistant commented Apr 30, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@etairl

etairl commented Apr 30, 2026

Copy link
Copy Markdown
Contributor Author

Thank you! I'll give this a better look later...

Thanks. Let me know of any required changes / questions.

@etairl

etairl commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

Thank you! I'll give this a better look later...

@anakin87 Any update? Really looking to get the 3 bugs fixed as they were found as part of our dependency chain scanning.

@anakin87 anakin87 self-assigned this May 4, 2026
@anakin87

anakin87 commented May 4, 2026

Copy link
Copy Markdown
Member

I looked better into this...

From our policy:

Haystack is a framework intended to run inside a trusted execution environment. It assumes that the application built with it has already validated and sanitized user-supplied input... Validation and sanitization of input [...] are the responsibility of the application, not Haystack.

In addition, $ref in OpenAPI can legitimately point to local files and remote URLs. Accepting this PR as-is would break those valid use cases.

Thanks for raising the concern, but for these reasons I'd suggest closing the PR.

@etairl

etairl commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

I looked better into this...

From our policy:

Haystack is a framework intended to run inside a trusted execution environment. It assumes that the application built with it has already validated and sanitized user-supplied input... Validation and sanitization of input [...] are the responsibility of the application, not Haystack.

In addition, $ref in OpenAPI can legitimately point to local files and remote URLs. Accepting this PR as-is would break those valid use cases.

Thanks for raising the concern, but for these reasons I'd suggest closing the PR.

Thanks for pointing it out. I'll move input validation upstream to better handle all edge cases then. I'll close the other 2 PRs as well. I did stumble upon a non-security bug, I'll create a PR soon.

@etairl etairl closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants