Jump to content

User:AndreaWest/WDQS Testing/Running TFT

From Wikitech

In order to execute BorderCloud's Tests for Triplestore (TFT) codebase on a local installation of a database (and without docker and jmeter), changes were made to the code and test definitions. This page explains the changes, as well as providing references to all the backing code. Also included are the steps to execute the tests, using a Stardog DB for the example, and details on how to modify and extend the tests.

Testing Overview

The TFT infrastructure was forked from the "master" branch (not the default, "withJMeter" branch) of the BorderCloud repository. The tests were also originally forked from BorderCloud, from the rdf-tests repository, and in February 2024, substituted with the current W3C RDF test definitions. (Note that the BorderCloud tests were originally forked from the referenced W3C repository.) There is one exception to the replacement by the W3C tests - the protocol subdirectory of the SPARQL 1.1 tests was not updated because it relied on/defined JMeter actions. For these tests, BorderCloud updated the W3C definitions to include an "mf:action" triple in each test. The objects of those triples were references to JMeter test plan files, which were also added to the directory. Therefore, although not used, the original BorderCloud tests were left in the protocol subdirectory.

The new TFT and test declaration repositories are the:

Minor changes were made to the RDF test definitions. Specifically, the manifest*.ttls in the sub-directories of rdf-tests/sparql/sparql11 were updated. Those files make reference to SPARQL query, TTL/RDF and other text files (used as inputs and outputs to validate test results) using an IRI declaration (left and right carets). In the original test definitions, only file names with no explicit namespaces or path were specified, but a default namespace was defined for each manifest.ttl file.

Since the IRIs were simply file names (with no authority such as http:, file:, etc.), some data stores may have unpredictable behavior when handling the references. For this reason, the triples in the test definitions in the manifest.ttl files were updated to change the format from (for example) ":test_1 mf:action <syntax-select-expr-01.rq>" to ":test_1 mf:action :syntax-select-expr-01.rq", to explicitly use the default namespace specified in the Turtle.

The code behind these changes can be found in the FixTTL Jupyter notebook in the updated RDF tests repository. Note that the original files are present in each directory, named manifest*.ttl.bak.

Please note that the RDF Protocol and GraphStoreProtocol tests from the RDF SPARQL 1.1 repository are not executed. In TFT, these tests require JMeter, which was not installed nor was the infrastructure updated to define the appropriate test plans. The underlying SPARQL functionality that is examined is fully addressed by other tests. The only difference is that the requests are sent and responses received using raw HTTP. The Protocol tests have been removed by commenting out their invocation in the tft script. The GraphStoreProtocol tests were not part of the original BorderCloud infrastructure and will require the addition of new JMeter plans.

Also note that there are no details defined for the RDF ServiceDescription tests. Hence, they are not included in the testing.

As regards the GeoSPARQL tests, the BorderCloud tests were not used since they were not complete. Instead, the tests from the GeoSPARQL Benchmark repository were utilized. That repository was forked to create the repo noted above. The test data and a subset of the test definitions are included, and are defined using the TFT format. The specific GeoSPARQL tests that are included are specified in the README.md of the repository and are shown when the GitHub page is accessed.

To move the GeoSPARQL tests from the original repository's test infrastructure to TFT required defining:

  • A manifest-all.ttl to indicate the GeoSPARQL compliance areas (Core, Topology Vocabulary, Geometry Extension and Geometry Topology Extension) being evaluated
  • A sub-directory for each compliance area, to hold a manifest.ttl file and the test inputs (.rdf files), queries (.rq files) and results (.srx files)
    • Within each directory/GeoSPARQL compliance area, a manifest.ttl file was created to define the tests' details
      • Each test was declared to be dawgt:Approved because this was required by the TFT infrastructure to avoid problems with RDF tests that were defined as "Proposed"
      • Note that each test loads the same data file (dataset.rdf). Loading the RDF could have been done in advance of the testing, but that would have required additional changes to the TFT code.
    • The names of any alternative result files that did not include the text, '-alternative-', were modified to do so
      • Alternative result files are explained in the paper, A GeoSPARQL Compliance Benchmark, in section 3.4.3
      • For example, testing of GeoSPARQL Requirement 9 was defined as using:
        • query-r09-4.rq, the query
        • query-r09-4.srx, the first alternative result file (which was renamed to query-r09-4-alternative-2.srx)
        • query-r09-4-alternative-1.srx, the second alternative results file
      • When alternative outputs were possible, the manifest.ttl defined the "result" triple using the format, query-r##(-<optional#>)-alternative-<number_of_srx_files>.srx
        • For GeoSPARQL Requirement 9, that meant that the following triple was declared: ":req09-4 mf:result :query-r09-4-alternative-2.srx"
        • The renaming enabled easier result processing in Test.php, which is discussed in more detail in the Code Modifications section below
    • Note that other than the name changes above, the .rq and .srx files are unmodified from the original repository

Since GeoSPARQL functionality is not required by WDQS, the GeoSPARQL tests have been disabled in the TFT config.ini file.

Incorporating the Tests Using Git Submodules

Both the BorderCloud and updated TFT repositories incorporate tests using git submodules. Therefore, if the tests are updated in either the RDF or GeoSPARQL repositories, the changes have to be incorporated/merged into the TFT repository. This is accomplished by the following instructions:

cd submoduleDirInTFT   (for example, cd tests/rdf-tests)
git fetch
git checkout master
git merge origin/master
cd tftTopLevelDir
git status    # should show changes to the submoduleDirInTFT
git add submoduleDirInTFT
git commit -m "Updated submodule"
git push

Code Modifications

The TFT codebase was modified to not require external databases or Docker, and to allow tests to be pulled from a local file server (e.g., a directory published as a simple HTTP server) or from a different test repository. The goal was to make minimal changes to the infrastructure.

The following files were updated and are available in the AndreaWesterinen/TFT repository. This is the directory that is cloned in the instructions below.

  • config.ini
    • Updated to test "standard" SPARQL 1.1, to reference the correct repository and local path for test files, to add new listTestSuite entry (with the W3C SPARQL test location and the location of the local file servers), and to reference the aliases and online/queriable locations of the databases to be used in SERVICE queries
    • The original entries from the file are commented out using a beginning semi-colon (";")
    • Without the addition of the new listTestSuite entries, when running php ./tft, many of the tests were unable to locate the appropriate input/output files
      • Although not elegant, this was the fastest and easiest solution to the problem of locating the files
  • AbstractTest.php, Test.php and Tools.php
    • Where RDF test data files were specified in manifest*.ttl and referenced as IRIs with the default namespace, the reference to "manifest#" needed to be removed
    • (In Test.php and Tools.php) Requests to the SERVICE endpoints to load data required the addition of "update" to the SPARQL endpoint path
      • These changes were made to the clearAllTriples() and importGraphInput() functions in Test, and the loadData() function in Tools
      • A more "correct" solution would have been to add command line arguments when invoking php ./tft (as is possible for the test suite and test databases) to define different update and query SERVICE endpoints
      • But, a quicker solution was simply to change "query" in the SERVICE endpoint path to "update"
    • (In Test.php) The clearAllTriples processing was updated to use SPARQL "CLEAR ALL" versus deleting each graph one by one
    • (In Test.php) Comments were added where the BorderCloud SPARQL Client incorrectly adds a space in the HTTP Accept Header mime types for CSV and TSV
      • This should be corrected in the BorderCloud SPARQL Client directly since it does cause test failures
    • (In Test.php) GeoSPARQL test evaluation required checking multiple "alternative" result files
      • Changes were made to the checkResult() function
      • The processing involved checking if the text, "-alternative-", occurred in the file name, and if so, cycling through the possible result files (starting with -alternative-1)
      • If the triples in the tested database matched the contents of any of the alternative output files, then the test was deemed successful and no further result files were checked
      • To account for -verbose and -debug output, details of each of the result comparisons are captured in the test "message"
  • tft and tft-testsuite
    • Clarified the 'usage' text and error messages
    • (In tft) Commented out the execution of ProtocolTest::doAllTests() and GraphProtocolTest::doAllTests() since they require JMeter plans
  • TestSuite.php
    • Corrected the test suite instance creation to reference the manifest files, etc from the localhost HTTP server
  • ProtocolTest.php, QueryEvaluationTest.php and UpdateEvaluationTest.php
    • Shortened several queries and added variable bindings as well as updating processing for optional variable bindings
    • Corrected messages and comments
  • CSVResultFormatTest.php
    • Corrected the calls to addGraphOutput() in doAllTests()

Executing the Tests

The following execution example uses a local copy of the Stardog server (which was already installed on my laptop) to test the changes and process.

  • Start the triple store with security disabled
    • With security enabled, accessing the SERVICE endpoints resulted in permission errors. The php ./tft code does not allow the specification of the SERVICE endpoints' user names and passwords (as it does for the test details and tested databases). In lieu of addressing this problem, the shortcut of disabling security was taken.
    • Using the command below, Stardog is accessible as localhost at port 5820
stardog-admin server start --bind --disable-security
  • Set up the necessary data stores in the triple store
    • The example* stores represent databases accessed as SERVICEs
    • The tft-tests database holds the test details and results
    • The tft-outputs data store is the database being tested
      • For stardog, IF GeoSPARQL tests are enabled (in the config.ini file), the tft-outputs database needs to be initialized with the configuration parameters, spatial.enabled and spatial.use.jts, set to "true"
stardog-admin db create -n example
stardog-admin db create -n example1
stardog-admin db create -n example2
stardog-admin db create -n tft-tests
stardog-admin db create -n tft-outputs
  • Get the TFT codebase and RDF tests
git clone --recursive https://github.com/AndreaWesterinen/TFT
  • Move to the TFT directory just created
cd TFT
  • Start a local HTTP server (in the TFT directory, port 8000) to support referencing the manifest.ttl files as IRIs
python3 -m http.server 8000
  • Install the BorderCloud SPARQL client (which requires composer)
composer install
  • Load the tests into the tft-tests data store
php ./tft-testsuite -a -q 'http://localhost:5820/tft-tests/query' -u 'http://localhost:5820/tft-tests/update'
  • If everything is running correctly, you should see output similar to the following (note that the localhost:8000 graph may report 0 triples to clean)
Configuration about tests :
- Endpoint type        : standard
- Endpoint query       : http://localhost:5820/tft-tests/query
- Endpoint update      : http://localhost:5820/tft-tests/update
- Mode install all     : ON
- Test suite : URL     :
- Test suite : folder  :
- Mode verbose         : OFF
- Mode debug           : OFF
============ CLEAN GRAPH <https://andreawesterinen.github.io/rdf-tests/sparql/sparql11/>
Before to clean : 0 triples
After to clean : 0 triples
Start to init the dataset via URL
40 new graphs
============ CLEAN GRAPH <http://localhost:8000/tests/rdf-tests/sparql/sparql11/>
Before to clean : 7310 triples 
After to clean : 7310 triples
Start to init the dataset via URL
40 new graphs
  • Execute the tests (note the definition of the tested software using the command line arguments, softwareName, softwareDescribe and softwareDescribeTag)
php ./tft -q 'http://localhost:5820/tft-tests/query' -u 'http://localhost:5820/tft-tests/update' -tq http://localhost:5820/tft-outputs/query -tu http://localhost:5820/tft-outputs/update -o ./junit -r urn:results --softwareName="Stardog" --softwareDescribeTag=v9.1.0 --softwareDescribe=9.1.0-test
  • You should see output similar to what is listed directly below. There are a few items to note:
    • The results use the convention, '.' for success, 'F' for failure, 'E' for some error, 'S' for skipped
    • The large number of tests marked as "skipped" in the QueryEvaluationTest are caused by TFT infrastructure errors related to entailment. These tests are not currently relevant to Wikidata and will not present a problem.
    • The GeoSPARQL tests (if executed) are defined as QueryEvaluationTests
    • The tests that reference "http://www.w3.org/2009/sparql/docs/tests/data-sparql11/" (in the latter part of the output) are an artifact of the config.ini file, as noted in the section above -> the last set of test results (labelled as "TEST : http://www.w3.org/2009/sparql/docs/tests/data-sparql11/") can be ignored
Configuration about tests :
- Graph of output EARL : urn:results
- Output of tests      : ./junit
- Endpoint type        : standard
- Endpoint query       : http://localhost:5820/tft-tests/query
- Endpoint update      : http://localhost:5820/tft-tests/update
- TEST : Endpoint type        : standard
- TEST : Endpoint query       : http://localhost:5820/tft-outputs/query
- TEST : Endpoint update      : http://localhost:5820/tft-outputs/update
- Mode verbose         : OFF
- Mode debug           : OFF
TEST : https://andreawesterinen.github.io/rdf-tests/sparql/sparql11/

TESTS : PositiveSyntaxTest
.Nb tests : 60

TESTS : NegativeSyntaxTest
.Nb tests : 46

TESTS : QueryEvaluationTest.Nb tests : 260
NB of tests (260/252 in theory) is incorrect.

		TESTS : CSVResultFormatTest
.Nb tests : 3
		TESTS : UpdateEvaluationTest
.Nb tests : 183
NB of tests (183/93 in theory) is incorrect.
		TESTS : PositiveUpdateSyntaxTest
.Nb tests : 42
		TESTS : NegativeUpdateSyntaxTest
.Nb tests : 13
TEST : http://www.w3.org/2009/sparql/docs/tests/data-sparql11/

TESTS : PositiveSyntaxTest
.Nb tests : 0

TESTS : NegativeSyntaxTest
.Nb tests : 0

TESTS : QueryEvaluationTest.Nb tests : 0

		TESTS : CSVResultFormatTest
.Nb tests : 0

		TESTS : UpdateEvaluationTest
.Nb tests : 0

		TESTS : PositiveUpdateSyntaxTest
.Nb tests : 0

		TESTS : NegativeUpdateSyntaxTest
.Nb tests : 0

  • To determine the final results, execute the query below
    • Note that these tests do NOT use the tft-score code
    • Also, note that the graph name is the one specified with the -r option in the php ./tft instruction above
stardog query execute tft-tests "prefix earl: <http://www.w3.org/ns/earl#>
SELECT ?out (COUNT(DISTINCT ?assertion) AS ?cnt)
        GRAPH <urn:results> {
                ?assertion a earl:Assertion.
                ?assertion earl:test ?test.
                ?assertion earl:result ?result.
                ?result earl:outcome ?out .
} GROUP BY ?out"
  • Results will be reported as shown:
|                out                 |  cnt  |
| http://www.w3.org/ns/earl#passed   | 704   |
| http://www.w3.org/ns/earl#failed   | 8     |
| http://www.w3.org/ns/earl#error    | 6     |
| http://www.w3.org/ns/earl#untested | 146   |

  • To see the tests which failed, execute this query:
stardog query tft-tests "prefix earl: <http://www.w3.org/ns/earl#>
select distinct ?s where {
        GRAPH <urn:results> { {?s earl:outcome earl:failed} 
                              UNION {?s earl:outcome earl:error} }
  • The details will be similar to what is listed below and indicate the specific sub-directory and test which failed
|                                        s                                         |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-fed/manifest#test_ |
| 3/Syntax/2024-02-12T02:40:27+00:00                                               |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#tsv0 |
| 1/Protocol/2024-02-12T02:40:27+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#tsv0 |
| 2/Protocol/2024-02-12T02:40:27+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#tsv0 |
| 3/Protocol/2024-02-12T02:40:27+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/functions/manifest#bnode0 |
| 1/Response/2024-02-12T02:40:27+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#csv0 |
| 1/Protocol/2024-02-12T02:40:27+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#csv0 |
| 2/Protocol/2024-02-12T02:40:27+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/csv-tsv-res/manifest#csv0 |
| 3/Protocol/2024-02-12T02:40:27+00:00                                             |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/drop/manifest#dawg-drop-n |
| amed-01/Response/2024-02-12T02:40:27+00:00                                       |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-update-1/manifest# |
| test_18/Response/2024-02-12T02:40:27+00:00                                       |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-update-1/manifest# |
| test_28/Response/2024-02-12T02:40:27+00:00                                       |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-update-1/manifest# |
| test_8/Response/2024-02-12T02:40:27+00:00                                        |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-update-1/manifest# |
| test_51/Syntax/2024-02-12T02:40:27+00:00                                         |
| http://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-update-1/manifest# |
| test_54/Syntax/2024-02-12T02:40:27+00:00                                         |

Getting More Information Using Verbose Mode

If you are experiencing errors when loading the test suites or running the tests, use the -v and -d flags when executing the php ./tft-testsuite and/or php ./tft programs.

Information on the RDF SPARQL Tests

All changes to the RDF SPARQL tests are described in the bullets below:

  • All manifest.ttl files were updated per the changes discussed above
  • construct/constructwhere04.rq, exists/exists03.rq, property-path/path-ng-01.rq and path-ng-02.rq, syntax-fed/syntax-service-03.rq, and syntax-update-1/syntax-update-08.ru, syntax-update-18.ru and syntax-update-28.ru
  • drop/drop-default.ttl
  • json-res/jsonres01.srj and jsonres02.srj
    • Modified to reference the correct bnode name, "o6" (NOT "b6")
  • service/data05.ttl and service05.srx
    • Modified to reference the correct SPARQL endpoints for TFT testing (required for the tests to correctly execute in the TFT infrastructure)
  • subquery/sq02.srx
    • Manifest modified to load data into graph, "instance#b", so the resulting binding set is "instance#a" and property "schema#p"
  • subquery/sq03.rq
    • Modified the query to not bind the graph variable
  • syntax-fed/syntax-service-01.rq and syntax-service-02.rq

How to Change or Extend the Tests

Any of the repositories (TFT, rdf-tests or GeoSPARQLBenchmark-Tests) can be updated via a pull request or by forking. If either the RDF or GeoSPARQL test repositories are forked, it is recommended that the TFT repo also be forked and its submodule links reset/redefined. An example is shown in the commit history of the TFT repository.

It is necessary to edit the manifest*.ttl files and the TFT code is updated to pull the manifest files from the local directories. But, if the details of an existing RDF/SPARQL test are modified (either the .rq, .srx, .srj, data ttl or other files), then the manifest files must be further updated to retrieve them for query, data loading, result comparison, etc. For example, examining rdf-tests/sparql/sparql11/construct/manifest.ttl, one finds the following:

@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://www.w3.org/2009/sparql/docs/tests/data-sparql11/construct/manifest#> .
@prefix rdfs:	<http://www.w3.org/2000/01/rdf-schema#> .
@prefix mf:     <http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#> .
@prefix qt:     <http://www.w3.org/2001/sw/DataAccess/tests/test-query#> .
@prefix dawgt:   <http://www.w3.org/2001/sw/DataAccess/tests/test-dawg#> .
@prefix local:  <http://localhost:8000/tests/rdf-tests/sparql/sparql11/construct/> .

<>  rdf:type mf:Manifest ;
    rdfs:label "CONSTRUCT" ;
    ) .
:constructwhere04 rdf:type mf:QueryEvaluationTest ;
    mf:name    "constructwhere04 - CONSTRUCT WHERE" ;
    rdfs:comment "CONSTRUCT WHERE  with DatasetClause";
    dawgt:approval dawgt:Approved ;
    dawgt:approvedBy <http://www.w3.org/2009/sparql/meeting/2011-02-01#resolution_3> ;
        [ qt:query  local:constructwhere04.rq ;
          qt:graphData [ qt:graph :data.ttl ;
                         rdfs:label "urn:data.ttl" ] ] ;
    mf:result :constructwhere04result.ttl 
# ARW: A named graph must be an IRI with the appropriate format, scheme:xxx

A new prefix, local:, was added and used when referencing the CONSTRUCT query. Note that it references the correct rdf-tests file path to the "construct" subdirectoy. When TFT executes this test, the code retrieves the query contents from the local file system (via the SimpleHTTPServer) rooted in the TFT directory. If the prefix was not updated, the query would have been retrieved from the W3C rdf-tests repository.

Similarly, rdf-tests/sparql/sparql11/drop/manifest.ttl defines a new local: prefix, and references a revised drop-data.ttl file.

@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://www.w3.org/2009/sparql/docs/tests/data-sparql11/drop/manifest#> .
@prefix rdfs:	<http://www.w3.org/2000/01/rdf-schema#> .
@prefix dawgt:  <http://www.w3.org/2001/sw/DataAccess/tests/test-dawg#> .
@prefix mf:     <http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#> .
@prefix qt:     <http://www.w3.org/2001/sw/DataAccess/tests/test-query#> .
@prefix ut:     <http://www.w3.org/2009/sparql/tests/test-update#> .
@prefix local:  <http://localhost:8000/tests/rdf-tests/sparql/sparql11/drop/> .

<>  rdf:type mf:Manifest ;
    rdfs:label "DROP" ;
    rdfs:comment "Tests for SPARQL UPDATE" ;
:dawg-drop-default-01 a mf:UpdateEvaluationTest ;
    mf:name    "DROP DEFAULT" ;
    rdfs:comment "This is a DROP of the default graph" ;
    dawgt:approval dawgt:Approved;
    dawgt:approvedBy <http://www.w3.org/2009/sparql/meeting/2012-02-07#resolution_3> ;
    mf:action [
    			ut:request :drop-default-01.ru ; 
                ut:data local:drop-default.ttl ;
                ut:graphData [ ut:graph :drop-g1.ttl ;
                               rdfs:label "http://example.org/g1" ] ;
                ut:graphData [ ut:graph :drop-g2.ttl ;
                               rdfs:label "http://example.org/g2" ] ;
              ] ;
    mf:result [
                ut:graphData [ ut:graph :drop-g1.ttl ;
                               rdfs:label "http://example.org/g1" ] ;
                ut:graphData [ ut:graph :drop-g2.ttl ;
                               rdfs:label "http://example.org/g2" ] ;
              ] ;

These changes were required to address errors in the RDF Tests, idiosyncrasies of the TFT code, and/or the need to define graph and service references as URLs or URNs. These changes are discussed in the section above.

To add a new set of tests for either RDF or GeoSPARQL, begin by updating the manifest-all.ttl file in the rdf-tests/sparql11/data-sparql11 or geosparql-tests/geosparql directory. The manifest-all file identifies which component directories (with their manifest.tll files) should be included.

To add a test or to modify any of the existing tests, create or edit the manifest.ttl file in the appropriate subdirectory of rdf-tests/sparql/sparql11 or geosparql-tests/geosparql. The manifest.ttl files contain the information encoded using the conventions defined in the SPARQL 1.1 Test Case Structure document. Note that for GeoSPARQL, all of the tests were defined as QueryEvaluationTests. This may or may not be true for new tests.

When creating a manifest.ttl file, remember to update the default namespace defined in the prefixes.

In addition, an entirely new repository of tests can be added, similar to the approach taken for adding the GeoSPARQL tests (discussed in the Testing Overview section above. If this is done, it is again recommended that the TFT repository be forked and its submodule links reset/redefined.

When defining a new repository, the config.ini file in the TFT repo must be updated to include the new repository as a new listTestSuite entry, and access to the manifest and test files must be provided. In the current case, the latter is accomplished using a SimpleHTTPServer defined at the top-level TFT directory.