GraphDB Free 8.2
Table of contents
- General
- Quick start guide
- Installation
- Administration
- Usage
- Loading data
- Exploring data
- Querying Data
- Exporting data
- Using the Workbench REST API
- Using GraphDB with the RDF4J API
- Additional indexing
- GraphDB connectors
- GraphDB dev guide
- Experimental features
- References
- Release notes
- FAQ
- Support
GraphDB Free 8.2
Table of contents
- General
- Quick start guide
- Installation
- Administration
- Usage
- Loading data
- Exploring data
- Querying Data
- Exporting data
- Using the Workbench REST API
- Using GraphDB with the RDF4J API
- Additional indexing
- GraphDB connectors
- GraphDB dev guide
- Experimental features
- References
- Release notes
- FAQ
- Support
Query behaviour¶
What’s in this document?
What are named graphs¶
Hint
GraphDB supports the following SPARQL specifications:
An RDF database can store collections of RDF statements (triples) in separate graphs identified (named) by a URI. A group of statements with a unique name is called a ‘named graph’. An RDF database has one more graph, which does not have a name, and it is called the ‘default graph’.
The SPARQL query syntax provides a means to execute queries across default and named graphs using FROM and FROM NAMED clauses. These clauses are used to build an RDF dataset, which identifies what statements the SPARQL query processor will use to answer a query. The dataset contains a default graph and named graphs and is constructed as follows:
FROM <uri>
- brings statements from the database graph, identified by URI, to the dataset’s default graph, i.e., the statements ‘lose’ their graph name.FROM NAMED <uri>
- brings the statements from the database graph, identified by URI, to the dataset, i.e., the statements keep their graph name.
If either FROM
or FROM NAMED
are used, the database’s default graph is
no longer used as input for processing this query. In effect, the
combination of FROM and FROM NAMED clauses exactly defines the dataset.
This is somewhat bothersome, as it precludes the possibility, for
instance, of executing a query over just one named graph and the default
graph. However, there is a programmatic way to get around this
limitation as described below.
The default SPARQL dataset¶
Note
The SPARQL specification does not define what happens when no FROM
or FROM NAMED
clauses are present in a query, i.e., it does not
define how a SPARQL processor should behave when no dataset is
defined. In this situation, implementations are free to construct
the default dataset as necessary.
GraphDB constructs the default dataset as follows:
- The dataset’s default graph contains the merge of the database’s default graph AND all the database named graphs;
- The dataset contains all named graphs from the database.
This means that if a statement ex:x ex:y ex:z
exists in the database in
the graph ex:g
, then the following query patterns will behave as
follows:
Query | Bindings |
---|---|
SELECT * { ?s ?p ?o } |
?s=ex:x ?p=ex:y ?o=ex:z |
SELECT * { GRAPH ?g { ?s ?p ?o } } |
?s=ex:x ?p=ex:y ?o=ex:z ?g=ex:g |
In other words, the triple ex:x ex:y ex:z
will appear to be in both the
default graph and the named graph ex:g
.
There are two reasons for this behaviour:
- It provides an easy way to execute a triple pattern query over all stored RDF statements.
- It allows all named graph names to be discovered, i.e., with this
query:
SELECT ?g { GRAPH ?g { ?s ?p ?o } }
.
How to manage explicit and implicit statements¶
GraphDB maintains two flags for each statement:
- Explicit: the statement is inserted in the database by the user, using SPARQL UPDATE, the RDF4J API or the imports configuration parameter configuration parameter. The same explicit statement can exist in the database’s default graph and in each named graph.
- Implicit: the statement is created as a result of inference, by either Axioms or Rules. Inferred statements are ALWAYS created in the database’s default graph.
These two flags are not mutually exclusive. The following sequences of operations are possible:
- For the operations, use the names ‘insert/delete’ for explicit, and ‘infer/retract’ for implicit (retract means that all premises of the statement are deleted or retracted).
- To show the results after each operation, use tuples
<statement graph flags>
:<s G EI>
means statements
in graphG
having both flags Explicit and Implicit;<s _ EI>
means statements
in the default graph having both flags Explicit and Implicit;<_ G _>
means the statement is deleted from graphG
.
First, let’s consider operations on statement s
in the default
graph only:
- insert
<s _ E>
, infer<s _ EI>
, delete<s _ I>
, retract<_ _ _>
; - insert
<s _ E>
, infer<s _ EI>
, retract<s _ E>
, delete<_ _ _>
; - infer
<s _ I>
, insert<s _ EI>
, delete<s _ I>
, retract<_ _ _>
; - infer
<s _ I>
, insert<s _ EI>
, retract<s _ E>
, delete<_ _ _>
; - insert
<s _ E>
, insert<s _ E>
, delete<_ _ _>
; - infer
<s _ I>
, infer<s _ I>
, retract<_ _ _>
(if the two inferences are from the same premises).
This does not show all possible sequences, but it shows the principles:
- No duplicate statement can exist in the default graph;
- Delete/retract clears the appropriate flag;
- The statement is deleted only after both flags are cleared;
- Deleting an inferred statement has no effect (except to clear the
I
flag, if any); - Retracting an inserted statement has no effect (except to clear the
E
flag, if any); - Inserting the same statement twice has no effect: insert is idempotent;
- Inferring the same statement twice has no effect: infer is
idempotent, and
I
is a flag, not a counter, but the Retraction algorithm ensuresI
is cleared only after all premises ofs
are retracted.
Now, let’s consider operations on statement
s
in the named graph G
, and inferred statement s
in the default
graph:
- insert
<s G E>
, infer<s _ I> <s G E>
, delete<s _ I>
, retract<_ _ _>
; - insert
<s G E>
, infer<s _ I> <s G E>
, retract<s G E>
, delete<_ _ _>
; - infer
<s _ I>
, insert<s G E> <s _ I>
, delete<s _ I>
, retract<_ _ _>
; - infer
<s _ I>
, insert<s G E> <s _ I>
, retract<s G E>
, delete<_ _ _>
; - insert
<s G E>
, insert<s G E>
, delete<_ _ _>
; - infer
<s _ I>
, infer<s _ I>
, retract<_ _ _>
(if the two inferences are from the same premises).
The additional principles here are:
- The same statement can exist in several graphs - as explicit in graph
G
and implicit in the default graph; - Delete/retract works on the appropriate graph.
Note
In order to avoid a proliferation of duplicate statements, it is recommended not to insert inferable statements in named graphs.
How to query explicit and implicit statements¶
The database’s default graph can contain a mixture of explicit and
implicit statements. The RDF4J API provides a flag called
‘includeInferred’, which is passed to several API methods and when set
to false
causes only explicit statements to be iterated or
returned. When this flag is set to true
, both explicit and implicit
statements are iterated or returned.
GraphDB provides extensions for more control over the
processing of explicit and implicit statements. These extensions allow
the selection of explicit, implicit or both for query answering and also
provide a mechanism for identifying which statements are explicit and
which are implicit. This is achieved by using some ‘pseudo-graph’ names
in FROM
and FROM NAMED
clauses, which cause certain flags to be set.
The details are as follows:
FROM <http://www.ontotext.com/explicit>
- The dataset’s default graph includes only explicit statements from the database’s default graph.
FROM <http://www.ontotext.com/implicit>
- The dataset’s default graph includes only inferred statements from the database’s default graph.
FROM NAMED <http://www.ontotext.com/explicit>
- The dataset contains a named graph http://www.ontotext.com/explicit that includes only explicit statements from the database’s default graph, i.e., quad patterns such as GRAPH ?g {?s ?p ?o} rebind explicit statements from the database’s default graph to a graph named http://www.ontotext.com/explicit.
FROM NAMED <http://www.ontotext.com/implicit>
- The dataset contains a named graph http://www.ontotext.com/implicit that includes only implicit statements from the database’s default graph.
Note
These clauses do not affect the construction of the default dataset in the sense that using any combination of the above will still result in a dataset containing all named graphs from the database. All it changes is which statements appear in the dataset’s default graph and whether any extra named graphs (explicit or implicit) appear.
How to specify the dataset programmatically¶
The RDF4J API provides an interface Dataset
and an implementation
class DatasetImpl
for defining the dataset for a query by providing
the URIs of named graphs and adding them to the default graphs and named
graphs members. This permits null
to be used to identify the default
database graph (or null context
to use RDF4J terminology).
DatasetImpl dataset = new DatasetImpl();
dataset.addDefaultGraph(null);
dataset.addNamedGraph(valueFactory.createURI("http://example.com/g1"));
This dataset can then be passed to queries or updates, e.g.:
TupleQuery query = connection.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
query.setDataset(dataset);
How to access internal identifiers for entities¶
Internally, GraphDB uses integer identifiers (IDs) to index all entities (URIs, blank nodes and literals). Statement indices are made up of these IDs and a large data structure is used to map from ID to entity value and back. There are occasions (e.g., when interfacing to an application infrastructure) when having access to these internal IDs can improve the efficiency of data structures external to GraphDB by allowing them to be indexed by an integer value rather than a full URI.
Here, we introduce a special GraphDB predicate and function that
provide access to the internal IDs. The datatype of the internal IDs is
<http://www.w3.org/2001/XMLSchema#long>
.
Predicate | <http://www.ontotext.com/owlim/entity#id> |
---|---|
Description | A map between an entity and an internal ID |
Example | Select all entities and their IDs: PREFIX ent: <http://www.ontotext.com/owlim/entity#>
SELECT * WHERE {
?s ent:id ?id
} ORDER BY ?id
|
Function | <http://www.ontotext.com/owlim/entity#id> |
---|---|
Description | Return an entity’s internal ID |
Example | Select all statements and order them by the internal ID of the object values: PREFIX ent: <http://www.ontotext.com/owlim/entity#>
SELECT * WHERE {
?s ?p ?o .
} order by ent:id(?o)
|
Examples¶
Enumerate all entities and bind the nodes to
?s
and their IDs to?id
,order by ?id
:select * where { ?s <http://www.ontotext.com/owlim/entity#id> ?id } order by ?id
Enumerate all non-literals and bind the nodes to
?s
and their IDs to?id
,order by ?id
:SELECT * WHERE { ?s <http://www.ontotext.com/owlim/entity#id> ?id . FILTER (!isLiteral(?s)) . } ORDER BY ?id
Find the internal IDs of subjects of statements with specific predicate and object values:
SELECT * WHERE { ?s <http://test.org#Pred1> "A literal". ?s <http://www.ontotext.com/owlim/entity#id> ?id . } ORDER BY ?id
Find all statements where the object has the given internal ID by using an explicit, untyped value as the ID (the
"115"
is used as object in the second statement pattern):SELECT * WHERE { ?s ?p ?o. ?o <http://www.ontotext.com/owlim/entity#id> "115" . }
As above, but using an
xsd:long
datatype for the constant within aFILTER
condition:SELECT * WHERE { ?s ?p ?o. ?o <http://www.ontotext.com/owlim/entity#id> ?id . FILTER (?id="115"^^<http://www.w3.org/2001/XMLSchema#long>) . } ORDER BY ?o
Find the internal IDs of subject and object entities for all statements:
SELECT * WHERE { ?s ?p ?o. ?s <http://www.ontotext.com/owlim/entity#id> ?ids. ?o <http://www.ontotext.com/owlim/entity#id> ?ido. }
Retrieve all statements where the ID of the subject is equal to
"115"^^xsd:long
, by providing an internal ID value within a filter expression:SELECT * WHERE { ?s ?p ?o. FILTER ((<http://www.ontotext.com/owlim/entity#id>(?s)) = "115"^^<http://www.w3.org/2001/XMLSchema#long>). }
Retrieve all statements where the string-ised ID of the subject is equal to
"115"
, by providing an internal ID value within a filter expression:SELECT * WHERE { ?s ?p ?o. FILTER (str( <http://www.ontotext.com/owlim/entity#id>(?s) ) = "115"). }
How to use RDF4J ‘direct hierarchy’ vocabulary¶
GraphDB supports the RDF4J specific vocabulary for determining ‘direct’ subclass, subproperty and type relationships. The special vocabulary used and their definitions are shown below. The three predicates are all defined using the namespace definition:
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
Predicate | Definition |
---|---|
A sesame:directSubClassOf B | Class A is a direct subclass of B if:
|
P sesame:directSubPropertyOf Q | Property P is a direct subproperty of Q if:
|
I sesame:directType T | Resource I is a direct type of T if:
|
Other special GraphDB query behaviour¶
There are several more special graph URIs in GraphDB, which are used for controlling query evaluation.
FROM
/FROM NAMED <http://www.ontotext.com/disable-sameAs>
- Switch off the enumeration of equivalence classes produced by the Optimisation of owl:sameAs.
By default, all
owl:sameAs
URIs are returned by triple pattern matching. This clause reduces the number of results to include a single representative from eachowl:sameAs
class. For more details, see Not enumerating sameAs. FROM
/FROM NAMED <http://www.ontotext.com/count>
- Used for triggering the evaluation of the query, so that it gives a single result in which
all variable bindings in the projection are replaced with a plain literal, holding the value
of the total number of solutions of the query. In the case of a CONSTRUCT query in which
the projection contains three variables (
?subject
,?predicate
,?object
), the subject and the predicate are bound to<http://www.ontotext.com/>
and the object holds the literal value. This is because there cannot exist a statement with a literal in the place of the subject or predicate. This clause is deprecated in favor of using theCOUNT
aggregate of SPARQL 1.1. FROM
/FROM NAMED <http://www.ontotext.com/skip-redundant-implicit>
- Used for triggering the exclusion of implicit statements when there is an explicit one within a specific context (even default). Initially implemented to allow for filtering of redundant rows where the context part is not taken into account and which leads to ‘duplicate’ results.
FROM <http://www.ontotext.com/distinct>
- Using this special graph name in
DESCRIBE
andCONSTRUCT
queries will cause only distinct triples to be returned. This is useful when several resources are being described, where the same triple can be returned more than once, i.e., when describing its subject and its object. This clause is deprecated in favor of using theDISTINCT
clause of SPARQL 1.1.