GH-5870: add basic RDF 1.2 support to FedX#5871
Conversation
Introduce TripleRefStatementPattern as a new FedX algebra node that combines a StatementPattern with a TripleRef, representing SPARQL/RDF 1.2 reification patterns (rdf:reifies and embedded triple subjects). Wire GenericInfoOptimizer to fold Join(TripleRef, StatementPattern) into TripleRefStatementPattern via OptimizerUtil.flattenTripleRefJoins, and register the resulting node in the statements list via meetOther. Extend SourceSelection to handle FedXStatementPattern instances directly (avoiding an unnecessary StatementSourcePattern wrapper), and update QueryStringUtil to serialize the triple term syntax <<( ... )>> for TripleRefStatementPattern. Add unit tests for GenericInfoOptimizer covering three AST shapes, and integration tests in RDF12Tests for single-source reification queries. Note: TripleRefStatementPattern.evaluate() is not yet implemented; federated (multi-source) RDF 1.2 query handling remains incomplete.
Add TripleRefJoinGroup as a new algebra node that wraps a TripleRefStatementPattern together with co-located StatementPatterns sharing the same unbound subject variable, exclusive to a single source. Add TripleRefJoinOptimizer, which visits NJoin nodes depth-first and folds qualifying TripleRefStatementPattern + StatementPattern pairs into TripleRefJoinGroup instances. The grouping relies on the assumption that a shared unbound subject variable implies blank-node locality to a single dataset (no source-exclusivity check is performed). Wire the optimizer into FederationEvaluationStrategy after optimizeExclusiveExpressions (conditional invocation is TODO). Add getStatementSource() to TripleRefStatementPattern and extend RDF12Tests with a multi-source join test covering the new optimizer path.
Introduce enableTripleRefSupport config flag (default false) to gate RDF 1.2 triple term handling; both GenericInfoOptimizer (flattenTripleRefJoins) and TripleRefJoinOptimizer only activate when the flag is set, and GenericInfoOptimizer now tracks hasTripleRefExpr to skip the latter pass when no triple ref patterns are present. Generalize TripleRefJoinGroup and TripleRefStatementPattern from a single StatementSource to List<StatementSource>, removing the earlier single-source assumption and enabling multi-source reification queries. Wire evaluation of TripleRefJoinGroup into FederationEvaluationStrategy: build a SELECT query via QueryStringUtil.selectQueryStringTripleRefJoinGroup and dispatch it to the assigned sources. Add test data files for two endpoints and extend RDF12Tests with four integration tests covering single- and multi-source reification joins in both the rdf:reifies and embedded-triple-subject syntactic forms.
49bb8c5 to
eaefee1
Compare
There was a problem hiding this comment.
Pull request overview
Adds initial RDF 1.2 “triple term” (reification / embedded triple) support to the FedX federation engine by introducing new algebra nodes and optimizations, plus query rendering and evaluation support, gated behind a configuration flag.
Changes:
- Introduce
TripleRefStatementPattern/TripleRefJoinGroupand related optimizers to recognize and group triple-term patterns for evaluation. - Extend FedX query serialization and evaluation strategy to generate and execute SPARQL queries containing RDF 1.2 triple term syntax.
- Add RDF 1.2 federation integration tests + optimizer unit tests and test data.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/federation/src/test/resources/tests/rdf1_2/data01endpoint2.ttl | Adds endpoint 2 RDF 1.2 reification test data |
| tools/federation/src/test/resources/tests/rdf1_2/data01endpoint1.ttl | Adds endpoint 1 RDF 1.2 reification test data |
| tools/federation/src/test/java/org/eclipse/rdf4j/federated/RDF12Tests.java | New integration tests for RDF 1.2 triple term queries in FedX |
| tools/federation/src/test/java/org/eclipse/rdf4j/federated/optimizer/GenericInfoOptimizerTest.java | New unit tests asserting optimizer rewrites into triple-term-aware nodes |
| tools/federation/src/test/java/org/eclipse/rdf4j/federated/FedXBaseTest.java | Normalizes anonymous variable names in query-plan assertions |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/util/QueryStringUtil.java | Adds query string generation for triple-term join groups and triple-term serialization |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/optimizer/TripleRefJoinOptimizer.java | New optimizer to group triple-term patterns with related statement patterns |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/optimizer/SourceSelection.java | Source selection updates to work directly with FedXStatementPattern instances |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/optimizer/OptimizerUtil.java | Adds join-flattening rewrite for triple-term join shapes |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/optimizer/GenericInfoOptimizer.java | Hooks triple-term rewrite into generic optimization and tracks presence of triple-term expressions |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/FedXConfig.java | Adds enableTripleRefSupport feature flag |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/evaluation/FederationEvaluationStrategy.java | Adds optimization + evaluation support for TripleRefJoinGroup |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/algebra/TripleRefStatementPattern.java | New algebra node combining statement pattern + TripleRef |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/algebra/TripleRefJoinGroup.java | New algebra node representing grouped triple-term join evaluation |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/algebra/StatementSourcePattern.java | Removes redundant addStatementSource (moved to base class) |
| tools/federation/src/main/java/org/eclipse/rdf4j/federated/algebra/FedXStatementPattern.java | Adds addStatementSource helper for reuse across statement pattern implementations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public static TupleExpr flattenTripleRefJoins(Join join, QueryInfo queryInfo) { | ||
| // recursion | ||
| if (join.getLeftArg()instanceof Join nestedJoin) { | ||
| join.setLeftArg(flattenTripleRefJoins(nestedJoin, queryInfo)); | ||
| } | ||
| if (join.getRightArg()instanceof Join nestedJoin) { | ||
| join.setRightArg(flattenTripleRefJoins(nestedJoin, queryInfo)); | ||
| } | ||
|
|
||
| // handle own node | ||
| if (join.getLeftArg()instanceof TripleRef tr && join.getRightArg()instanceof StatementPattern stmt) { | ||
|
|
||
| TupleExpr newExpr = new TripleRefStatementPattern(stmt, tr, queryInfo); | ||
| return newExpr; | ||
| } | ||
| if (join.getRightArg()instanceof TripleRef tr && join.getLeftArg()instanceof StatementPattern stmt) { | ||
|
|
||
| TupleExpr newExpr = new TripleRefStatementPattern(stmt, tr, queryInfo); | ||
| return newExpr; | ||
| } | ||
|
|
||
| // leave actual join untouched | ||
| return join; | ||
| } |
There was a problem hiding this comment.
Actually this is generated by the spotless formatter for whatever reason and I need to retain it for passing the build ...
| res.append("SELECT "); | ||
|
|
||
| for (String var : varNames) { | ||
| res.append(" ?").append(var); | ||
| } | ||
|
|
There was a problem hiding this comment.
not relevant in this case: varNames is expected to have one free var. If later we find cases where this is not the case, we need to resolve the join differently
8524764 to
e4c38f5
Compare
| @Override | ||
| public CloseableIteration<BindingSet> evaluate(BindingSet bindings) throws QueryEvaluationException { | ||
| // TODO Auto-generated method stub | ||
| return null; | ||
| } | ||
|
|
||
| } |
e4c38f5 to
15ad100
Compare
15ad100 to
a8b2761
Compare
| assuredBindingNames.add(getContextVar().getName()); | ||
| } | ||
|
|
||
| // Note: ; the statement's object var is an internal join id |
GitHub issue resolved: #5870
Commit 1: GH-5870: add initial RDF 1.2 support to FedX query optimizer (WIP)
Introduce TripleRefStatementPattern as a new FedX algebra node that
combines a StatementPattern with a TripleRef, representing SPARQL/RDF
1.2 reification patterns (rdf:reifies and embedded triple subjects).
Wire GenericInfoOptimizer to fold Join(TripleRef, StatementPattern) into
TripleRefStatementPattern via OptimizerUtil.flattenTripleRefJoins, and
register the resulting node in the statements list via meetOther.
Extend SourceSelection to handle FedXStatementPattern instances directly
(avoiding an unnecessary StatementSourcePattern wrapper), and update
QueryStringUtil to serialize the triple term syntax <<( ... )>> for
TripleRefStatementPattern.
Add unit tests for GenericInfoOptimizer covering three AST shapes, and
integration tests in RDF12Tests for single-source reification queries.
Note: TripleRefStatementPattern.evaluate() is not yet implemented;
federated (multi-source) RDF 1.2 query handling remains incomplete.
Commit 2: GH-5870: add initial RDF 1.2 support to FedX query optimizer (WIP)
Introduce TripleRefStatementPattern as a new FedX algebra node that
combines a StatementPattern with a TripleRef, representing SPARQL/RDF
1.2 reification patterns (rdf:reifies and embedded triple subjects).
Wire GenericInfoOptimizer to fold Join(TripleRef, StatementPattern) into
TripleRefStatementPattern via OptimizerUtil.flattenTripleRefJoins, and
register the resulting node in the statements list via meetOther.
Extend SourceSelection to handle FedXStatementPattern instances directly
(avoiding an unnecessary StatementSourcePattern wrapper), and update
QueryStringUtil to serialize the triple term syntax <<( ... )>> for
TripleRefStatementPattern.
Add unit tests for GenericInfoOptimizer covering three AST shapes, and
integration tests in RDF12Tests for single-source reification queries.
Note: TripleRefStatementPattern.evaluate() is not yet implemented;
federated (multi-source) RDF 1.2 query handling remains incomplete.
Commit 3: GH-5870: add end-to-end evaluation of TripleRefJoinGroup for RDF 1.2
Introduce enableTripleRefSupport config flag (default false) to gate RDF
1.2 triple term handling; both GenericInfoOptimizer
(flattenTripleRefJoins) and TripleRefJoinOptimizer only activate when
the flag is set, and GenericInfoOptimizer now tracks hasTripleRefExpr to
skip the latter pass when no triple ref patterns are present.
Generalize TripleRefJoinGroup and TripleRefStatementPattern from a
single StatementSource to List, removing the earlier
single-source assumption and enabling multi-source reification queries.
Wire evaluation of TripleRefJoinGroup into FederationEvaluationStrategy:
build a SELECT query via
QueryStringUtil.selectQueryStringTripleRefJoinGroup and dispatch it to
the assigned sources.
Add test data files for two endpoints and extend RDF12Tests with four
integration tests covering single- and multi-source reification joins in
both the rdf:reifies and embedded-triple-subject syntactic forms.
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resourcesto format from the command line)