Query behaviour

What are named graphs

An RDF database can store collections of RDF statements (triples) in separate graphs identified (named) by a URI. A group of statements with a unique name is called a ‘named graph’. An RDF database has one more graph, which does not have a name, and it is called the ‘default graph’.

The SPARQL query syntax provides a means to execute queries across default and named graphs using FROM and FROM NAMED clauses. These clauses are used to build an RDF dataset, which identifies what statements the SPARQL query processor will use to answer a query. The dataset contains a default graph and named graphs and is constructed as follows:

  • FROM <uri> - brings statements from the database graph, identified by URI, to the dataset’s default graph, i.e., the statements ‘lose’ their graph name.
  • FROM NAMED <uri> - brings the statements from the database graph, identified by URI, to the dataset, i.e., the statements keep their graph name.

If either FROM or FROM NAMED are used, the database’s default graph is no longer used as input for processing this query. In effect, the combination of FROM and FROM NAMED clauses exactly defines the dataset. This is somewhat bothersome, as it precludes the possibility, for instance, of executing a query over just one named graph and the default graph. However, there is a programmatic way to get around this limitation as described below.

The default SPARQL dataset

Note

The SPARQL specification does not define what happens when no FROM or FROM NAMED clauses are present in a query, i.e., it does not define how a SPARQL processor should behave when no dataset is defined. In this situation, implementations are free to construct the default dataset as necessary.

GraphDB constructs the default dataset as follows:

  • The dataset’s default graph contains the merge of the database’s default graph AND all the database named graphs;
  • The dataset contains all named graphs from the database.

This means that if a statement ex:x ex:y ex:z exists in the database in the graph ex:g, then the following query patterns will behave as follows:

Query Bindings
SELECT * { ?s ?p ?o } ?s=ex:x ?p=ex:y ?o=ex:z
SELECT * { GRAPH ?g { ?s ?p ?o } } ?s=ex:x ?p=ex:y ?o=ex:z ?g=ex:g

In other words, the triple ex:x ex:y ex:z will appear to be in both the default graph and the named graph ex:g.

There are two reasons for this behaviour:

  1. It provides an easy way to execute a triple pattern query over all stored RDF statements.
  2. It allows all named graph names to be discovered, i.e., with this query: SELECT ?g { GRAPH ?g { ?s ?p ?o } }.

How to manage explicit and implicit statements

GraphDB maintains two flags for each statement:

  • Explicit: the statement is inserted in the database by the user, using SPARQL UPDATE, the RDF4J API or the imports configuration parameter configuration parameter. The same explicit statement can exist in the database’s default graph and in each named graph.
  • Implicit: the statement is created as a result of inference, by either Axioms or Rules. Inferred statements are ALWAYS created in the database’s default graph.

These two flags are not mutually exclusive. The following sequences of operations are possible:

  • For the operations, use the names ‘insert/delete’ for explicit, and ‘infer/retract’ for implicit (retract means that all premises of the statement are deleted or retracted).
  • To show the results after each operation, use tuples <statement graph flags> :
    • <s G EI> means statement s in graph G having both flags Explicit and Implicit;
    • <s _ EI> means statement s in the default graph having both flags Explicit and Implicit;
    • <_ G _> means the statement is deleted from graph G.

First, let’s consider operations on statement s in the default graph only:

  • insert <s _ E>, infer <s _ EI>, delete <s _ I>, retract <_ _ _>;
  • insert <s _ E>, infer <s _ EI>, retract <s _ E>, delete <_ _ _>;
  • infer <s _ I>, insert <s _ EI>, delete <s _ I>, retract <_ _ _>;
  • infer <s _ I>, insert <s _ EI>, retract <s _ E>, delete <_ _ _>;
  • insert <s _ E>, insert <s _ E>, delete <_ _ _>;
  • infer <s _ I>, infer <s _ I>, retract <_ _ _> (if the two inferences are from the same premises).

This does not show all possible sequences, but it shows the principles:

  • No duplicate statement can exist in the default graph;
  • Delete/retract clears the appropriate flag;
  • The statement is deleted only after both flags are cleared;
  • Deleting an inferred statement has no effect (except to clear the I flag, if any);
  • Retracting an inserted statement has no effect (except to clear the E flag, if any);
  • Inserting the same statement twice has no effect: insert is idempotent;
  • Inferring the same statement twice has no effect: infer is idempotent, and I is a flag, not a counter, but the Retraction algorithm ensures I is cleared only after all premises of s are retracted.

Now, let’s consider operations on statement s in the named graph G, and inferred statement s in the default graph:

  • insert <s G E>, infer <s _ I> <s G E>, delete <s _ I>, retract <_ _ _>;
  • insert <s G E>, infer <s _ I> <s G E>, retract <s G E>, delete <_ _ _>;
  • infer <s _ I>, insert <s G E> <s _ I>, delete <s _ I>, retract <_ _ _>;
  • infer <s _ I>, insert <s G E> <s _ I>, retract <s G E>, delete <_ _ _>;
  • insert <s G E>, insert <s G E>, delete <_ _ _>;
  • infer <s _ I>, infer <s _ I>, retract <_ _ _> (if the two inferences are from the same premises).

The additional principles here are:

  • The same statement can exist in several graphs - as explicit in graph G and implicit in the default graph;
  • Delete/retract works on the appropriate graph.

Note

In order to avoid a proliferation of duplicate statements, it is recommended not to insert inferable statements in named graphs.

How to query explicit and implicit statements

The database’s default graph can contain a mixture of explicit and implicit statements. The RDF4J API provides a flag called ‘includeInferred’, which is passed to several API methods and when set to false causes only explicit statements to be iterated or returned. When this flag is set to true, both explicit and implicit statements are iterated or returned.

GraphDB provides extensions for more control over the processing of explicit and implicit statements. These extensions allow the selection of explicit, implicit or both for query answering and also provide a mechanism for identifying which statements are explicit and which are implicit. This is achieved by using some ‘pseudo-graph’ names in FROM and FROM NAMED clauses, which cause certain flags to be set.

The details are as follows:

FROM <http://www.ontotext.com/explicit>
The dataset’s default graph includes only explicit statements from the database’s default graph.
FROM <http://www.ontotext.com/implicit>
The dataset’s default graph includes only inferred statements from the database’s default graph.
FROM NAMED <http://www.ontotext.com/explicit>
The dataset contains a named graph http://www.ontotext.com/explicit that includes only explicit statements from the database’s default graph, i.e., quad patterns such as GRAPH ?g {?s ?p ?o} rebind explicit statements from the database’s default graph to a graph named http://www.ontotext.com/explicit.
FROM NAMED <http://www.ontotext.com/implicit>
The dataset contains a named graph http://www.ontotext.com/implicit that includes only implicit statements from the database’s default graph.

Note

These clauses do not affect the construction of the default dataset in the sense that using any combination of the above will still result in a dataset containing all named graphs from the database. All it changes is which statements appear in the dataset’s default graph and whether any extra named graphs (explicit or implicit) appear.

How to specify the dataset programmatically

The RDF4J API provides an interface Dataset and an implementation class DatasetImpl for defining the dataset for a query by providing the URIs of named graphs and adding them to the default graphs and named graphs members. This permits null to be used to identify the default database graph (or null context to use RDF4J terminology).

DatasetImpl dataset = new DatasetImpl();
dataset.addDefaultGraph(null);
dataset.addNamedGraph(valueFactory.createURI("http://example.com/g1"));

This dataset can then be passed to queries or updates, e.g.:

TupleQuery query = connection.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
query.setDataset(dataset);

How to access internal identifiers for entities

Internally, GraphDB uses integer identifiers (IDs) to index all entities (URIs, blank nodes and literals). Statement indices are made up of these IDs and a large data structure is used to map from ID to entity value and back. There are occasions (e.g., when interfacing to an application infrastructure) when having access to these internal IDs can improve the efficiency of data structures external to GraphDB by allowing them to be indexed by an integer value rather than a full URI.

Here, we introduce a special GraphDB predicate and function that provide access to the internal IDs. The datatype of the internal IDs is <http://www.w3.org/2001/XMLSchema#long>.

Predicate <http://www.ontotext.com/owlim/entity#id>
Description A map between an entity and an internal ID
Example

Select all entities and their IDs:

PREFIX ent: <http://www.ontotext.com/owlim/entity#>
SELECT * WHERE {
?s ent:id ?id
} ORDER BY ?id
Function <http://www.ontotext.com/owlim/entity#id>
Description Return an entity’s internal ID
Example

Select all statements and order them by the internal ID of the object values:

PREFIX ent: <http://www.ontotext.com/owlim/entity#>
SELECT * WHERE {
?s ?p ?o .
} order by ent:id(?o)

Examples

  • Enumerate all entities and bind the nodes to ?s and their IDs to ?id, order by ?id:

    select * where {
      ?s <http://www.ontotext.com/owlim/entity#id> ?id
    } order by ?id
    
  • Enumerate all non-literals and bind the nodes to ?s and their IDs to ?id, order by ?id:

    SELECT * WHERE {
      ?s <http://www.ontotext.com/owlim/entity#id> ?id .
      FILTER (!isLiteral(?s)) .
    } ORDER BY ?id
    
  • Find the internal IDs of subjects of statements with specific predicate and object values:

    SELECT * WHERE {
      ?s <http://test.org#Pred1> "A literal".
      ?s <http://www.ontotext.com/owlim/entity#id> ?id .
    } ORDER BY ?id
    
  • Find all statements where the object has the given internal ID by using an explicit, untyped value as the ID (the "115" is used as object in the second statement pattern):

    SELECT * WHERE {
      ?s ?p ?o.
      ?o <http://www.ontotext.com/owlim/entity#id> "115" .
    }
    
  • As above, but using an xsd:long datatype for the constant within a FILTER condition:

    SELECT * WHERE {
      ?s ?p ?o.
      ?o <http://www.ontotext.com/owlim/entity#id> ?id .
      FILTER (?id="115"^^<http://www.w3.org/2001/XMLSchema#long>) .
    } ORDER BY ?o
    
  • Find the internal IDs of subject and object entities for all statements:

    SELECT * WHERE {
      ?s ?p ?o.
      ?s <http://www.ontotext.com/owlim/entity#id> ?ids.
      ?o <http://www.ontotext.com/owlim/entity#id> ?ido.
    }
    
  • Retrieve all statements where the ID of the subject is equal to "115"^^xsd:long, by providing an internal ID value within a filter expression:

    SELECT * WHERE {
      ?s ?p ?o.
      FILTER ((<http://www.ontotext.com/owlim/entity#id>(?s))
                    = "115"^^<http://www.w3.org/2001/XMLSchema#long>).
    }
    
  • Retrieve all statements where the string-ised ID of the subject is equal to "115", by providing an internal ID value within a filter expression:

    SELECT * WHERE {
      ?s ?p ?o.
      FILTER (str( <http://www.ontotext.com/owlim/entity#id>(?s) ) = "115").
    }
    

How to use RDF4J ‘direct hierarchy’ vocabulary

GraphDB supports the RDF4J specific vocabulary for determining ‘direct’ subclass, subproperty and type relationships. The special vocabulary used and their definitions are shown below. The three predicates are all defined using the namespace definition:

PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
Predicate Definition
A sesame:directSubClassOf B

Class A is a direct subclass of B if:

  1. A is a subclass of B and;
  2. A and B are not equal and;
  3. there is no class C (not equal to A or B) such that A is a subclass of C and C of B.
P sesame:directSubPropertyOf Q

Property P is a direct subproperty of Q if:

  1. P is a subproperty of Q and;
  2. P and Q are not equal and;
  3. there is no property R (not equal to P or Q) such that P is a subproperty of R and R of Q.
I sesame:directType T

Resource I is a direct type of T if:

  1. I is of type T and
  2. There is no class U (not equal to T) such that:
    1. U is a subclass of T and;
    2. I is of type U.

Other special GraphDB query behaviour

There are several more special graph URIs in GraphDB, which are used for controlling query evaluation.

FROM / FROM NAMED <http://www.ontotext.com/disable-sameAs>
Switch off the enumeration of equivalence classes produced by the Optimization of owl:sameAs. By default, all owl:sameAs URIs are returned by triple pattern matching. This clause reduces the number of results to include a single representative from each owl:sameAs class. For more details, see Not enumerating sameAs.
FROM / FROM NAMED <http://www.ontotext.com/count>
Used for triggering the evaluation of the query, so that it gives a single result in which all variable bindings in the projection are replaced with a plain literal, holding the value of the total number of solutions of the query. In the case of a CONSTRUCT query in which the projection contains three variables (?subject, ?predicate, ?object), the subject and the predicate are bound to <http://www.ontotext.com/> and the object holds the literal value. This is because there cannot exist a statement with a literal in the place of the subject or predicate. This clause is deprecated in favor of using the COUNT aggregate of SPARQL 1.1.
FROM / FROM NAMED <http://www.ontotext.com/skip-redundant-implicit>
Used for triggering the exclusion of implicit statements when there is an explicit one within a specific context (even default). Initially implemented to allow for filtering of redundant rows where the context part is not taken into account and which leads to ‘duplicate’ results.
FROM <http://www.ontotext.com/distinct>
Using this special graph name in DESCRIBE and CONSTRUCT queries will cause only distinct triples to be returned. This is useful when several resources are being described, where the same triple can be returned more than once, i.e., when describing its subject and its object. This clause is deprecated in favor of using the DISTINCT clause of SPARQL 1.1.