GraphDB Free 8.2
Table of contents
- General
- Quick start guide
- Installation
- Administration
- Usage
- References
- Release notes
- FAQ
- Support
GraphDB Free 8.2
Table of contents
Explain Plan¶
What’s in this document?
What is GraphDB’s Explain Plan¶
GraphDB’s Explain Plan is a feature that explains how GraphDB executes a SPARQL query and also includes information about unique subject, predicate and object collection sizes. It can help you improve the query, leading to better execution performance.
Activating the explain plan¶
To see the query explain plan, use the onto:explain
pseudo-graph:
PREFIX onto: <http://www.ontotext.com/>
select * from onto:explain
...
Simple explain plan¶
For the simplest query explain plan possible (?s ?p ?o
), execute the following query:
PREFIX onto: <http://www.ontotext.com/>
select * from onto:explain {
?s ?p ?o .
}
Depending on the number of triples that you have in the database, the results will vary, but you will get something like the following:
SELECT ?s ?p ?o
{
{ # ----- Begin optimization group 1 -----
?s ?p ?o . # Collection size: 108.0
# Predicate collection size: 108.0
# Unique subjects: 90.0
# Unique objects: 55.0
# Current complexity: 108.0
} # ----- End optimization group 1 -----
# ESTIMATED NUMBER OF ITERATIONS: 108.0
}
This is the same query, but with some estimations next to the statement pattern (1 in this case).
Note
The query might not be the same as the original one. See below the triple patterns in the order in which they are executed internally.
----- Begin optimization group 1 -----
- indicates starting a group of statements, which most probably are part of a subquery (in the case of property paths, the group will be the whole path);Collection size
- an estimation of the number of statements that match the pattern;Predicate collection size
- the number of statements in the database for this particular predicate (in this case, for all predicates);Unique subjects
- the number of subjects that match the statement pattern;Unique objects
- the number of objects that match the statement pattern;Current complexity
- the complexity (the number of atomic lookups in the index) the database will need to make so far in the optimisation group (most of the time a subquery). When you have multiple triple patterns, these numbers grow fast.----- End optimization group 1 -----
- the end of the optimisation group;ESTIMATED NUMBER OF ITERATIONS: 108.0
- the approximate number of iterations that will be executed for this group.
Multiple triple patterns¶
Note
The result of the explain plan is given in the exact order the engine is going to execute the query.
The following is an example where the engine reorders the triple patterns based on their complexity. The query is a simple join:
PREFIX onto: <http://www.ontotext.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select *
from onto:explain
{
?o rdf:type ?o1 .
?o rdfs:subPropertyOf ?o2
}
and here is the output:
SELECT ?o ?o1 ?o2
{
{ # ----- Begin optimization group 1 -----
?o rdfs:subPropertyOf ?o2 . # Collection size: 20.0
# Predicate collection size: 20.0
# Unique subjects: 19.0
# Unique objects: 18.0
# Current complexity: 20.0
?o rdf:type ?o1 . # Collection size: 43.0
# Predicate collection size: 43.0
# Unique subjects: 34.0
# Unique objects: 7.0
# Current complexity: 860.0
} # ----- End optimization group 1 -----
# ESTIMATED NUMBER OF ITERATIONS: 25.294117647058822
}
Understanding the output:
?o rdfs:subPropertyOf ?o1
has a lower collection size (20 instead of 43), so it will be executed first.?o rdf:type ?o1
has a bigger collection size (43 instead of 20), so it will be executed second (although it is written first in the original query).- The current complexity grows fast because it multiplies. In this case, you can expect to get 20 results from the first statement pattern and then you have to join them with the results from the second triple pattern, which results in the complexity of
20 * 43 = 860
. - Although the complexity for the whole group is 860, the estimated number of iterations for this group is 25.3.
Wine queries¶
All of the following examples refer to our simple wine dataset (wine.ttl
). The file is quite small, but here is some basic explanation about the data:
- There are different types of wine (Red, White, Rose).
- Each wine has a label.
- Wines are made from different types of grapes.
- Wines contain different levels of sugar.
- Wines are produced in a specific year.
First query with aggregation¶
A typical aggregation query contains a group with some aggregation function. Here, we have added an explain
graph:
# Retrieve the number of wines produced in each year along with the year
PREFIX onto: <http://www.ontotext.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <http://www.ontotext.com/example/wine#>
SELECT (COUNT(?wine) as ?wines) ?year
FROM onto:explain
WHERE {
?wine rdf:type :Wine .
OPTIONAL {
?wine :hasYear ?year
}
}
GROUP BY ?year
ORDER BY DESC(?wines)
When you execute the query on GraphDB, you get the following as an output (instead of the real results):
SELECT (COUNT(?wine) AS ?wines) ?year
{
{ # ----- Begin optimization group 1 -----
?wine rdf:type :wine#Wine . # Collection size: 5.0
# Predicate collection size: 64.0
# Unique subjects: 50.0
# Unique objects: 12.0
# Current complexity: 5.0
} # ----- End optimization group 1 -----
# ESTIMATED NUMBER OF ITERATIONS: 5.0
OPTIONAL
{
{ # ----- Begin optimization group 2 -----
?wine :hasYear ?year . # Collection size: 5.0
# Predicate collection size: 5.0
# Unique subjects: 5.0
# Unique objects: 2.0
# Current complexity: 5.0
} # ----- End optimization group 2 -----
# ESTIMATED NUMBER OF ITERATIONS: 5.0
}
}
GROUP BY ?year
ORDER BY DESC(?wines)
LIMIT 1000