SPARQL Federation

Overview

SPARQL 1.1 Federation provides extensions to the query syntax for executing distributed queries over any number of SPARQL endpoints. This feature is very powerful, and allows integration of RDF data from different sources using a single query.

For example, to discover DBpedia resources about people who have the same names as those stored in a local repository, use the following query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>

SELECT ?dbpedia_id
WHERE {
   ?person a foaf:Person ;
           foaf:name ?name .
   SERVICE <http://dbpedia.org/sparql> {
        ?dbpedia_id a dbpedia-owl:Person ;
                    foaf:name ?name .
   }
}

It matches the first part against the local repository and for each person it finds, it checks the DBpedia SPARQL endpoint to see if a person with the same name exists and, if so, returns the ID.

Note

Federation must be used with caution. First of all, to avoid doing excessive querying of remote (public) SPARQL endpoints, but also because it can lead to inefficient query patterns.

The following example finds resources in the second SPARQL endpoint that have a similar rdfs:label to the rdfs:label of <http://dbpedia.org/resource/Vaccination> in the first SPARQL endpoint:

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT ?endpoint2_id {
    VALUES ?endpoint1_id {
        <http://dbpedia.org/resource/Vaccination>
    }
    SERVICE <http://faraway_endpoint.org/sparql> {
        ?endpoint1_id rdfs:label ?l1 .
        FILTER( langMatches(lang(?l1), "en") )
    }
    SERVICE <http://remote_endpoint.com/sparql> {
        ?endpoint2_id rdfs:label ?l2 .
        FILTER( str(?l2) = str(?l1) )
    }
}

However, such a query is very inefficient, because no intermediate bindings are passed between endpoints. Instead, both subqueries execute independently, requiring the second subquery to return all X  rdfs:label Y statements that it stores. These are then joined locally to the (likely much smaller) results of the first subquery.

Query execution can be optimized by batching multiple values where the following is valid:

  • The default batching size is 15, which is ok to use in most cases.

  • You can change the default via the graphdb.federation.block.join.size global property.

  • By using a system graph, you can set a value only for a particular query evaluation.

Internal SPARQL federation

Since RDF4J repositories are also SPARQL endpoints, it is possible to use the federation mechanism to do distributed querying over several repositories on a local server. You can do it by referring to them as a standard SERVICE with their full path, or, if they are running on the same GraphDB instance, you can use the optimized local repository prefix. The prefix triggers the internal federation mechanism. The internal SPARQL federation is used in almost the same way as the standard SPARQL federation over HTTP, and has several advantages:

Speed

The HTTP transport layer is bypassed and iterators are accessed directly. The speed is comparable to accessing data in the same repository.

Security

When security is ON, you can access every repository that is readable by the currently authenticated user. Standard SPARQL 1.1 federation does not support authentication.

Flexibility

Inline parameters provide control over inference and statement expansion over owl:sameAs.

Usage

Instead of providing a URL to a remote repository, you need to provide a special URL of the form repository:NNN, where NNN is the ID of the repository you want to access. For example, to access the repository authors via internal federation, use a query like this:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX books: <http://example.com/books/>

SELECT ?authorName WHERE {
    ?book rdfs:label "The Hitchhiker's Guide to the Galaxy" ;
        books:author ?author .

    SERVICE <repository:authors> {
        ?author rdfs:label ?authorName
    }
}

The approach applied for DBpedia, SERVICE <http://localhost:7200/repositories/my_labels>, is also valid, but is less efficient.

Parameters

There are four parameters that control how the federated part of the query is executed:

Parameter

Definition

infer (boolean)

Controls if inferred statements are included. True by default.

When set to false, it is equivalent to adding FROM <http://www.ontotext.com/explicit> to the federated query.

sameAs (boolean)

Controls if statements are expanded over owl:sameAs. True by default.

When set to false, it is equivalent to adding FROM <http://www.ontotext.com/disable-sameAs> to the federated query.

from (string)

Can be repeated multiple times, translates to FROM <...>. No default value.

fromNamed (string)

Can be repeated multiple times, translates to FROM NAMED <...>. No default value.

To set a parameter, put a comma after the special URL referring to the internal repository, then the parameter name, an equals sign, and finally the value of the parameter. If you need to set more than one parameter, put another comma, parameter name, equals sign, and value.

Some examples:

repository:NNN,infer=false

Turns off inference and inferred statements are not included in the results.

repository:NNN,sameAs=false

Turns off the expansion of statements over owl:sameAs and they are not included in the results.

repository:NNN,infer=false,sameAs=false

Turns off the inferred statements and they are not included in the results.

Turns off the expansion of statements over owl:sameAs and they are not included in the results.

service <repository:repo1>

No FROM and FROM NAMED.

service <repository:repo1,from=http://test.com>

Adds FROM <http://test.com>.

service <repository:repo1,fromNamed=http://test.com/named>

Adds FROM NAMED <http://test.com/named>.

service <repository:repo1,from=http://test.com,fromNamed=http://test.com/named,sameAs=false>

Adds FROM <http://test.com>, adds FROM NAMED <http://test.com/named>, does not expand over owl:sameAs.

Note

This needs to be a valid URL and thus there cannot be spaces/blanks.

The example SPARQL query from above will look like this if you want to skip the inferred statements and disable the expansion over owl:sameAs:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX books: <http://example.com/books/>

SELECT ?authorName WHERE {
    ?book rdfs:label "The Hitchhiker's Guide to the Galaxy" ;
        books:author ?author .

    SERVICE <repository:authors,infer=false,sameAs=false> {
        ?author rdfs:label ?authorName
    }
}

Federated query to a remote password-protected repository

GraphDB repositories

You can also use federation to query a remote password-protected GraphDB repository by adding the other GraphDB instance as a remote location and specify the credentials for it.

For example, if the remote location is on http://localhost:7201, this will enable you to query the remote repository as follows:

PREFIX ex: <http://example.com/>
SELECT ?id ?label
WHERE {
    ?id a ex:Concept .
    SERVICE <http://localhost:7201/repositories/remote_repo_id> {
        ?id rdfs:label ?label.
    }
}

where <remote_repo_id> is the ID of the remote repository.

Any URL parameters supported by the remote endpoint can be used, e.g., if it is an RDF4J/GraphDB repository, it could be a URL like http://factforge.net/repositories/ff-news?infer=false to include only explicit statements.

SPARQL endpoints

For non-GraphDB repositories, i.e., SPARQL endpoints, there are two ways to perform a federated query to a password-protected SPARQL endpoint:

  • By editing the repository configuration as follows:

    1. Download the configuration file.

    2. In it, edit the repositoryURL (<http://user:password@db.example.com/sparql>) by placing your login details and the SPARQL endpoint name.

    3. Stop GraphDB if it is running.

    4. Create a new directory in $GDB_HOME/data/repositories/ with the same name as repositoryID from the config file.

    5. Place the edited config file in the newly created folder. Make sure that it is named config.ttl, as otherwise GraphDB will not recognize it and the repository will not be created.

    6. Start GraphDB again.

  • By importing the repository configuration file in the Workbench (does not require stopping GraphDB):

    1. Download the mentioned configuration file.

    2. In it, change rep:repositoryID "<RepoName>" to the name of your repository.

    3. Edit the repositoryURL (<http://user:password@db.example.com/sparql>) by placing your login details and the SPARQL endpoint name.

    4. Open GraphDB Workbench and go to Repositories ‣ Create new repository ‣ Create from file.

    5. Upload the file. The newly created repository will have the same name used for <RepoName>.

This will enable you to query the SPARQL endpoint:

PREFIX ex: <http://example.com/>
SELECT ?id ?label
WHERE {
    ?id a ex:Concept .
    SERVICE <repository:my_labels> {
        ?id rdfs:label ?label.
    }
}