Query Profiling with the Explain Plan

GraphDB’s Explain Plan is a feature that explains how GraphDB executes a SPARQL query. It also includes information about unique subject, predicate and object collection sizes. It can help you improve your query, leading to better execution performance.

Activating the explain plan

To see the query explain plan, use the onto:explain pseudo-graph:

PREFIX onto: <http://www.ontotext.com/>
select * from onto:explain

Simple explain plan

For the simplest query explain plan possible (?s ?p ?o), execute the following query:

PREFIX onto: <http://www.ontotext.com/>
select * from onto:explain {
   ?s ?p ?o .
}

Depending on the number of triples that you have in the database, the results will vary, but you will get something like the following:

_images/simple-explain-plan.png

This is the same query, but with some estimations next to the statement pattern (1 in this case).

Note

The query might not be the same as the original one. See below the triple patterns in the order in which they are executed internally.

  • ----- Begin optimization group 1 -----: indicates starting a group of statements, which most probably are part of a subquery (in the case of property paths, the group will be the whole path);

  • Collection size: an estimation of the number of statements that match the pattern;

  • Predicate collection size: the number of statements in the database for this particular predicate (in this case, for all predicates);

  • Unique subjects: the number of subjects that match the statement pattern;

  • Unique objects: the number of objects that match the statement pattern;

  • Current complexity: the complexity (the number of atomic lookups in the index) the database will need to make so far in the optimization group (most of the time a subquery). When you have multiple triple patterns, these numbers grow fast.

  • ----- End optimization group 1 -----: the end of the optimization group;

  • ESTIMATED NUMBER OF ITERATIONS: the approximate number of iterations that will be executed for this group.

Multiple triple patterns

Note

The result of the explain plan is given in the exact order, in which the engine will execute the query.

The following is an example where the engine reorders the triple patterns based on their complexity. The query is a simple join:

PREFIX onto: <http://www.ontotext.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select *
from onto:explain
{
    ?o rdf:type ?o1 .
    ?o rdfs:subPropertyOf ?o2
}

and the output is:

_images/explain-plan-multiple-triple-patterns.png

Understanding the output:

  • ?o rdfs:subPropertyOf ?o2 has a lower collection size (10 instead of 30), so it will be executed first.

  • ?o rdf:type ?o1 has a bigger collection size (30 instead of 10), so it will be executed second (although it is written first in the original query).

  • The current complexity grows fast because it multiplies. In this case, you can expect to get 10 results from the first statement pattern. Then you need to join them with the results from the second triple pattern, which results in the complexity of 10 * 30 = 300.

  • Although the complexity for the whole group is 300, the estimated number of iterations for this group is 14.3.

Wine queries

All of the following examples are based on this simple dataset describing five fictitious wines. The file is quite small and contains the following data:

  • There are different types of wine (Red, White, Rose).

  • Each wine has a label.

  • Wines are made from different types of grapes.

  • Wines contain different levels of sugar.

  • Wines are produced in a specific year.

Query with aggregation

A typical aggregation query contains a group with some aggregation function. Here, we have added an explain graph.

This query retrieves the number of wines produced in each year along with the year.

PREFIX onto: <http://www.ontotext.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wine: <http://www.ontotext.com/example/wine#>
SELECT (COUNT(?wine) as ?wines) ?year
FROM onto:explain
WHERE {
    ?wine rdf:type wine:Wine .
    OPTIONAL {
        ?wine wine:hasYear ?year
    }
}
GROUP BY ?year
ORDER BY DESC(?wines)

When you execute the query in GraphDB, you get the following as an output (instead of the real results):

_images/explain-plan-wine-query-with-aggregation.png

Query with filter aggregation

This aggregation query applies a filter to the result set after grouping via the HAVING clause. It retrieves red wines made from more than one type of grape along with their grapes count.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX wine: <http://www.ontotext.com/example/wine#>
PREFIX onto: <http://www.ontotext.com/>
SELECT ?wine (COUNT(?grape) AS ?grapeCount)
FROM onto:explain
WHERE {
    ?wine rdf:type wine:RedWine ;
          wine:madeFromGrape ?grape .
}
GROUP BY ?wine
HAVING (?grapeCount > 1)

The returned explain plan will be:

_images/explain-plan-wine-query-2.png

Query with filter function

This is a typical SPARQL query with filter function. It retrieves the wines that are made from Pinot Noir grape.

PREFIX onto: <http://www.ontotext.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX wine: <http://www.ontotext.com/example/wine#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?wine ?sugar ?year ?grapeLabel
FROM onto:explain
WHERE {
    ?wine rdf:type wine:Wine ;
          wine:hasSugar ?sugar ;
          wine:hasYear ?year ;
          wine:madeFromGrape ?grape .
    ?grape rdfs:label ?grapeLabel .
    FILTER (?grapeLabel = "Pinot Noir")
}

And the output will be:

_images/explain-plan-wine-query-3.png