Skip to content

Add modeling of Protocol test details to manifest#312

Merged
kasei merged 10 commits intomainfrom
gtw-protocol-test-improvements
Apr 9, 2026
Merged

Add modeling of Protocol test details to manifest#312
kasei merged 10 commits intomainfrom
gtw-protocol-test-improvements

Conversation

@kasei
Copy link
Copy Markdown
Contributor

@kasei kasei commented Mar 27, 2026

This adds modeling for all the existing Protocol tests based heavily on Gregg's work in #79 (which I think has diverged enough from the current manifest as a result of #306 that it's not worth trying to reconcile).

I am reasonably confident that this new data is a faithful encoding as they were produced automatically by parsing the HTTP requests in the previous rdfs:comment strings and modeling the resulting data. I used some regex matching to figure out the input for the expected results modeling.

There are a few of differences from Gregg's original model:

  • I do not provide an extra layer of modeling for headers (for example, "application/sparql-query; charset=UTF-16" is not also broken down into the header name and parameter elements)
  • I do not use ht:statusCodeValue with literal values like "4XX" (more on this below), instead using values such as ht:StatusCode2xx with a new mf:expectedStatus property
  • For expected results, I have moved away from the tests enumerating the acceptable media types (which seems fraught and likely to lag new, valid formats) and instead simply declare the type of results expected (boolean, tabular, or RDF; more on this below)

I introduce four new manifest terms to model the expected results:

  • mf:expectedStatus - Expected HTTP status code (pointing to values like ht:StatusCode2xx); this differs from Gregg's model which used _:response ht:statusCodeValue "2XX" which did not conform to the semantics of ht:statusCodeValue, and was my biggest issue with Fix protocol manifest #79
  • mf:expectedBoolean - Expected results for ASK queries.
  • mf:expectedFormat - Expected serialization format of the results; the range here is one of the literals: "boolean", "tabular", or "RDF" (I'm open to changing this to a controlled set of IRIs if desired and with suggestions on where such IRIs might live)
  • mf:expectation - A textual description of the expected results for the singel test update_base_uri where modeling the expectation would have been prohibitive. (Changing the test to check the expectation as a FILTER in the query and then using ASK might be a better approach here.)

I also (ab)use the SPARQL Update ut:graphData modeling to indicate the named graphs that should be loaded into the Dataset before the test is run:

ut:graphData [ ut:graph <data1.nt> ; rdfs:label "http://kasei.us/2009/09/sparql/data/data1.rdf" ] ;
ut:graphData [ ut:graph <data2.nt> ; rdfs:label "http://kasei.us/2009/09/sparql/data/data2.rdf" ] ;

For now those ut:graphData properties are hanging off of the test itself, which feels a bit strange, but it didn't feel any better to hang them off of the mf:action which in this manifest is a ht:Request. This is another area where I'm open to suggestions.

Finally, I made a few substantive (but I expect uncontroversial) changes to a few tests. For the following tests, I removed the requirement to return results in SPARQL XML format (removing the Accept header in the request, and the expected Content-Type of the result)

  • update_dataset_default_graphs
  • update_base_uri
  • update_dataset_default_graph
  • update_dataset_named_graphs
  • update_dataset_full

@kasei
Copy link
Copy Markdown
Contributor Author

kasei commented Mar 27, 2026

I wrote a ~200 line perl test runner based on this new manifest data. The dependency chain is very large, so may not be super useful for people without a normal perl setup. YMMV.

@kasei
Copy link
Copy Markdown
Contributor Author

kasei commented Mar 27, 2026

I've also left the old text-based rdfs:comment values that describe the request-response expectations to make it easier to review the new modeling. In the future we might want to remove those.

@afs – I tried using this code against Fuseki and got 3 failures. One is expected (update_base_uri for which the modeling doesn't provide details on how to evaluate whether it passed), but I couldn't immediately figure out whether the other two (bad_update_dataset_conflict and query_multiple_dataset) were an issue with my setup/use or something deeper.

Copy link
Copy Markdown
Contributor

@Tpt Tpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this work! I have not reviewed carefully the content yet. Just found a detail

Comment thread sparql/sparql11/protocol/manifest.ttl Outdated
@afs
Copy link
Copy Markdown
Contributor

afs commented Mar 29, 2026

@afs – I tried using this code against Fuseki and got 3 failures. One is expected (update_base_uri for which the modeling doesn't provide details on how to evaluate whether it passed), but I couldn't immediately figure out whether the other two (bad_update_dataset_conflict and query_multiple_dataset) were an issue with my setup/use or something deeper.

What errors do you get?
Do you have the Fuseki log? (if run, "-v" all HTP headers are printed as well) and what's the Fuseki config look like?
Is /sparql/ the whole path? (the databset name) In such case all operations are on that URL.

Did you start the server fuseki --mem /sparql?

Trying to work out what is actually sent so I have have misunderstood ...


Looking at query_multiple_dataset --

ht:absolutePath - does that mean it is sent to a URL with ?named-graph-uri= and also content header of application/sparql-query (+body?)

It's two operations superimposed. I think that it will dispatch to query (that happens to come before GSP - a legal choice would also be GSP and then conneg error on the content type).

Query checks the query string and ?named-graph-uri= is illegal. Do you see Malformed request: unrecognized parameters: default-graph-uri= error response?

ASK FROM <http://kasei.us/2009/09/sparql/data/data3.rdf> { GRAPH ?g1 { <http://kasei.us/2009/09/sparql/data/data1.rdf> a ?type } GRAPH ?g2 { <http://kasei.us/2009/09/sparql/data/data2.rdf> a ?type } }

FWIW the ASK is false - a dataset description is complete - the FROM sets the default graph there are no named graphs unless FROM NAMED is used as well.


bad_update_dataset_conflict

The request is ?using-named-graph-uri= but then there is a custom dataset and it does not have a named graph http://example/addresses so WITH may be the problem.

https://www.w3.org/TR/sparql12-protocol/#update-dataset says it's an error. 400.

@kasei
Copy link
Copy Markdown
Contributor Author

kasei commented Mar 29, 2026

@afs

I get these errors:

  • #query_multiple_dataset
  • #update_base_uri
  • #bad_update_dataset_conflict

I used fuseki quickstart, creating a "test" dataset, and using http://localhost:3030/test as the endpoint. (I think this is an acceptable way to use one endpoint for both query and update? If not, I'd have to adjust my testing code to differentiate query and update endpoints).


As mentioned, I think #update_base_uri is expected failure.


For #query_multiple_dataset:

ht:absolutePath - does that mean it is sent to a URL with ?named-graph-uri= and also content header of application/sparql-query (+body?)

Yes. Dataset specified in the HTTP query parameters, content-type specifying SPARQL Query, and the request body with the ASK query.

It's two operations superimposed. I think that it will dispatch to query (that happens to come before GSP - a legal choice would also be GSP and then conneg error on the content type).

I don't understand "two operations superimposed". I don't think there's anything GSP-related going on here.

Query checks the query string and ?named-graph-uri= is illegal. Do you see Malformed request: unrecognized parameters: default-graph-uri= error response?

No error. It returns a 200 with SRX encoding of false.

FWIW the ASK is false - a dataset description is complete - the FROM sets the default graph there are no named graphs unless FROM NAMED is used as well.

I don't think that's true. The FROM should be overridden by the dataset specified by the protocol (via HTTP query parameters). That's the entire point of this test.


For #bad_update_dataset_conflict:

This one should be an error because of the mixed use of USING in the update and using-named-graph-uri in the protocol.

The test as written (and approved by the previous WG) expects a 4xx error. I think that's the correct thing here, but the spec text doesn't actually specify this (saying only that this is "an error"). Fuseki returns a 500, so disagrees with the test, but not technically with the spec text.

@afs
Copy link
Copy Markdown
Contributor

afs commented Mar 30, 2026

I used fuseki quickstart, creating a "test" dataset, and using http://localhost:3030/test as the endpoint. (I think this is an acceptable way to use one endpoint for both query and update? If not, I'd have to adjust my testing code to differentiate query and update endpoints).

Yes, there would be both query and update if started with --update or an empty memory model.

fuseki-server --mem /test
fuseki-server --file=DATA --update /test and variants

How are you setting up the data?

If started with --file, it is read-only by default.

The test says: ht:absolutePath "/sparql/..." - where does that fit in? /test/sparql will also exist but is query only (there is also /test/update but /test has both.

(FWIW fuseki-server --mem / should work to give a no-path dataset, then /sparql is query only by default.)

If you could send me the log file (printed to stdout) I can see the setup details and data loading, ideally, with --verbose which prints detailed setup and detailed requests.

This is Fuseki 6.0.0?

I didn't look at update_base_uri I read "One is expected (update_base_uri..." as saying it was test-correct.

Use SILENT form of CLEAR GRAPH to ensure graph is empty before the
test runs, regardless of whether the graph already exists.
@kasei
Copy link
Copy Markdown
Contributor Author

kasei commented Mar 30, 2026

How are you setting up the data?

Combination of DROP ALL and INSERT DATA for the ut:graphData entries (can see actual code in the linked test runner above.).

The test says: ht:absolutePath "/sparql/..." - where does that fit in? /test/sparql will also exist but is query only (there is also /test/update but /test has both.

My test runner is replacing /sparql/ with the path from the endpoint URL supplied to the test runner. It feels a bit strange, but I think that's probably required if we continue to use ht:absolutePath. I don't think there's an alternative property for non-absolute path specification (I could be wrong about this).

This is Fuseki 6.0.0?

Yes.

I didn't look at update_base_uri I read "One is expected (update_base_uri..." as saying it was test-correct.

I manually validated the test. Fuseki works as expected. The test runner just doesn't know that because the validation logic isn't encoded in the manifest. I'd like to follow up this PR with a change to this test to make it simpler for test runners.

I am pushing a small fix for #update_base_uri so that the setup code uses CLEAR SILENT GRAPH instead of CLEAR GRAPH. This was causing problems with Fuseki as the graph didn't already exist.

@afs
Copy link
Copy Markdown
Contributor

afs commented Apr 4, 2026

(I have a log file from @kasei)

I extracted the requests for the log and then used curl to send the requests.

== bad_update_dataset_conflict

Fuseki bug fixed - I now get 400, and not 500.

== query_multiple_dataset

The query request is

http://localhost:3030/test?named-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2Fsparql%2Fdata%2Fdata1.rdf%26named-graph-uri%3Dhttp%3A%2F%2Fkasei.us%2F2009%2F09%2Fsparql%2Fdata%2Fdata2.rdf

It has ? and first = are unencoded, while the & (%26) and the second = (%3D) are URL-encoded.

There is one very long name.

If I do not encode & and second =, the ASK query returns true.

== update_base_uri

This test has a security problem.

The execution may be on a machine behind a firewall.

Depending on proxy/gateway protocol, the URL at point of execution has the local machine name/IP address, not that of the public side of the gateway. (A second problem is that update and query might be separate URLs.)

Fuseki uses a fixed, dummy base name http://server/unset-base to parse queries and updates so that the local host machine details are not visible.

I get a result set with:

?o="http://server/unset-base/test"

Hacking the code to use the servlet URL, and it is http://localhost:3030/test.

@kasei
Copy link
Copy Markdown
Contributor Author

kasei commented Apr 4, 2026

It has ? and first = are unencoded, while the & (%26) and the second = (%3D) are URL-encoded.

Good catch. Fixed (along with the addition of an rdfs:comment on the manifest itself, describing some of the expectations and issues of running the tests).

== update_base_uri

This test has a security problem.

The execution may be on a machine behind a firewall.

Depending on proxy/gateway protocol, the URL at point of execution has the local machine name/IP address, not that of the public side of the gateway. (A second problem is that update and query might be separate URLs.)

I'm not sure I see the security issue here. The test is not trying to validate the specific IRI that is resolved, but only that the relative IRI *is resolved to some absolute IRI. So I think Fuseki is already passing the test as written (even if the expectations of the test are only provided in prose).

That being said, the more I think about this test, the more I'm of the opinion that it is not really a protocol test. This requirement surely applies to the Protocol and any other means of submitting an update to the service (e.g. API). I'd be happy to just remove this test entirely. Thoughts?

@afs
Copy link
Copy Markdown
Contributor

afs commented Apr 4, 2026

only that the relative IRI *is resolved to some absolute IRI

True - I see it as encouraging/highlighting bad behavior.

(A system that rejected relative URIs wouldn't be such a bad thing.)

Thoughts?

Personally - remove the test.

@afs afs removed their assignment Apr 7, 2026
@afs
Copy link
Copy Markdown
Contributor

afs commented Apr 7, 2026

I suggest we merge this - it may get wider review that way.

@kasei
Copy link
Copy Markdown
Contributor Author

kasei commented Apr 7, 2026

@Tpt – any other comments before I merge?

Copy link
Copy Markdown
Contributor

@Tpt Tpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @afs let's merge this and iterate if needed

@kasei kasei merged commit c7817c2 into main Apr 9, 2026
2 checks passed
@kasei kasei mentioned this pull request Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants