SHACL Validation¶
What’s in this document?
What is SHACL validation?¶
W3C standard Shapes Constraint Language (SHACL) validation is a valuable tool for efficient data consistency checking, and is supported by GraphDB via RDF4J’s ShaclSail . It is useful in efforts towards data integration, as well as examining data compliance, e.g., every GeoName URI must start with http://geonames.com/
, or age must be above 18 years.
The language validates RDF graphs against a set of conditions. These conditions are provided as shapes and other constructs expressed in the form of an RDF graph. In SHACL, RDF graphs that are used in this manner are called shapes graphs, and the RDF graphs that are validated against a shapes graph are called data graphs.
A shape is an IRI or a blank node s
that fulfills at least one of the following conditions in the shapes graph:
s
is a SHACL instance ofsh:NodeShape
orsh:PropertyShape
.
s
is subject of a triple that hassh:targetClass
,sh:targetNode
,sh:targetObjectsOf
, orsh:targetSubjectsOf
as predicate.
s
is subject of a triple that has a parameter as predicate.
s
is a value of a shape-expecting, non-list-taking parameter such assh:node
, or a member of a SHACL list that is a value of a shape-expecting and list-taking parameter such assh:or
.
Every SHACL repository contains the ShaclSail reserved graph http://rdf4j.org/schema/rdf4j#SHACLShapeGraph
, where all the data is inserted.
It is also possible to specify your own custom graph via the sh:shapesGraph
property - see how to do it below.
Usage¶
Creating and configuring a SHACL repository¶
A repository with SHACL validation must be created from scratch, i.e., Create new. You cannot modify an already existing repository by enabling the validation afterwards.
Create a repository and enable the Support SHACL validation option. Several additional checkboxes are opened:
Cache select nodes: The ShaclSail retrieves a lot of its relevant data through running SPARQL SELECT queries against the underlying Sail and against the changes in the transaction. This is usually good for performance, but it is recommended to disable this cache while validating large amounts of data as it will be less memory-consuming. Default value is
true
.Log the executed validation plans: Logs (
INFO
) the executed validation plans as GraphViz DOT. It is recommended to disable Run parallel validation. Default value isfalse
.Run parallel validation: Runs validation in parallel. May cause deadlock, especially when using NativeStore. Default value is
true
.Log the execution time per shape: Logs (
INFO
) the execution time per shape. It is recommended to disable Run parallel validation and Cache select nodes. Default value isfalse
.DASH data shapes extensions: Activates DASH Data Shapes extensions. DASH Data Shapes Vocabulary is a collection of reusable extensions to SHACL for a wide range of use cases. Currently, this enables support for
dash:hasValueIn
,dash:AllObjectsTarget
, anddash:AllSubjectsTargetIt
.Log validation violations: Logs (
INFO
) a list of violations and the triples that caused the violations (BETA
). It is recommended to disable Run parallel validation. Default value isfalse
.Log every execution step of the SHACL validation: Logs (
INFO
) every execution step of the SHACL validation. This is fairly costly and should not be used in production. It is recommended to disable Run parallel validation. Default value isfalse
.RDF4J SHACL extensions: Activates RDF4J’s SHACL extensions (RSX) that provide additional functionality. RSX currently contains
rsx:targetShape
which will allow a Shape to be the target for your constraints. For more information about the RSX features, see the RSX section of RDF4J documentation.Named graphs for SHACL shapes: Sets the named graphs where SHACL shapes can be stored. Comma-delimited list.
![]()
Some of these are used for logging and validation - you can find more about it further down in this page.
Loading shapes and data graphs¶
You can load shapes using all three key methods for loading data into GraphDB: through the Workbench, with an INSERT
query in the SPARQL editor, and through the REST API.
Here is how to do it through the Workbench:
Go to
, and insert the following shape:prefix ex: <http://example.com/ns#> prefix sh: <http://www.w3.org/ns/shacl#> prefix xsd: <http://www.w3.org/2001/XMLSchema#> ex:PersonShape a sh:NodeShape ; sh:targetClass ex:Person ; sh:property [ sh:path ex:age ; sh:datatype xsd:integer ; ] .It indicates that entities of the class Person have a property “age” of the type
xsd:integer
.Click Import. In the dialog that opens, select . Insert the ShaclSail reserved graph
http://rdf4j.org/schema/rdf4j#SHACLShapeGraph
(or a custom named graph specified with thesh:shapesGraph
property) as shown below:![]()
After the shape has been imported, let’s test it with some data:
Again from
, insert correct data (i.e., age is an integer):prefix ex: <http://example.com/ns#> prefix sh: <http://www.w3.org/ns/shacl#> prefix xsd: <http://www.w3.org/2001/XMLSchema#> ex:Alice rdf:type ex:Person ; ex:age 12 ; .Leave the Import settings as they are, and click Import. You will see that the data has been imported successfully, as it is compliant with the shape you just inserted.
Now import incorrect data (i.e., age is a double):
prefix ex: <http://example.com/ns#> prefix sh: <http://www.w3.org/ns/shacl#> prefix xsd: <http://www.w3.org/2001/XMLSchema#> ex:Alice rdf:type ex:Person ; ex:age 12.1 ; .The import will fail, returning a detailed error message with all validation violations in both the Workbench and the command line.
Deleting shapes and data graphs¶
There are two ways to delete a SHACL shape: from the GraphDB Workbench and with the RDF4J API.
From the Workbench¶
Go to the SPARQL Editor in the Workbench.
Clear the RDF4J graph for storing shapes by running the following update query:
CLEAR GRAPH <http://rdf4j.org/schema/rdf4j#SHACLShapeGraph>
Note
Keep in mind that the Clean Repository option in the tab would not delete the shape graph, as it removes all data from the repository, but not SHACL shapes.
With the RDF4J API¶
Use the following code snippet:
HTTPRepository repository = new HTTPRepository("http://address:port/", "repositoryname"); try (RepositoryConnection connection = repository.getConnection()) { connection.begin(); connection.clear(RDF4J.SHACL_SHAPE_GRAPH); connection.commit(); }
Updating shapes and data graphs¶
To successfully update a shape graph, proceed as follows:
Go to the SPARQL Editor in the Workbench.
Clear the RDF4J graph for storing shapes by running the following update query:
CLEAR GRAPH <http://rdf4j.org/schema/rdf4j#SHACLShapeGraph>
Load the updated shape graph following the instructions in Loading shapes and data graphs.
Note
As shape graphs are stored separately from data, importing a new shape graph by enabling the Enable replacement of existing data box option in the Import settings dialog box would not work. This is why the above steps must be followed.
Viewing shapes and data graphs¶
Currently, shape graphs cannot be accessed with SPARQL inside GraphDB, as they are not part of the data. You can view the graph by using the RDF4J client to connect to the GraphDB repository. The following code snippet will return all statements inside the shape graph:
HTTPRepository repository = new HTTPRepository("http://address:port/", "repositoryname"); try (RepositoryConnection connection = repository.getConnection()) { Model statementsCollector = new LinkedHashModel(connection.getStatements(null, null, null, RDF4J.SHACL_SHAPE_GRAPH) .stream() .collect(Collectors.toList())); }
Validation logging and report¶
ShaclSail validates the data changes on commit()
. In case of a violation, it will throw an exception that contains
a validation report where you can find details about the noncompliance of your data. The exception will be shown in
the Workbench if it was caused by an update executed in the same Workbench window.
In addition to that, you may also enable ShaclSail logging to get additional validation information in the log files. To enable logging, check one of the three logging options when creating the SHACL repository:
Log the executed validation plans
Log validation violations
Log every execution step of the SHACL validation
All three will log as INFO
and appear in the main-[yyyy-mm-dd].log
file in the logs
directory of your GraphDB installation.
Supported SHACL features¶
The supported SHACL features are:
Feature |
Description |
---|---|
|
Specifies a target class. Each value of |
|
Specifies a node target. Each value of |
|
Specifies a subjects-of target in a shape. The values are IRIs. |
|
Specifies an objects-of target in a shape. The values are IRIs. |
|
Points at the IRI of the property that is being restricted. Alternative, it may point at a path expression, which would allow you to constrain values that may be several “hops” away from the starting point. |
|
An inverse path is a blank node that is the subject of exactly one triple in a graph. This triple has |
|
Specifies that each value node has a given property shape. |
|
Specifies the condition that each value node conforms to at least one of the provided shapes. |
|
Specifies the condition that each value node conforms to all provided shapes. This is comparable to conjunction and the logical “and” operator. |
|
Specifies the condition that each value node cannot conform to a given shape. This is comparable to negation and the logical “not” operator. |
|
Specifies the minimum number of value nodes that satisfy the condition. If the minimum cardinality value is 0 then this constraint is always satisfied and so may be omitted. |
|
Specifies the maximum number of value nodes that satisfy the condition. |
|
Specifies the minimum string length of each value node that satisfies the condition. This can be applied to any literals and IRIs, but not to blank nodes. |
|
Specifies the maximum string length of each value node that satisfies the condition. This can be applied to any literals and IRIs, but not to blank nodes. |
|
Specifies a regular expression that each value node matches to satisfy the condition. |
|
An optional string of flags, interpreted as in SPARQL 1.1 REGEX. The values of |
|
Specifies a condition to be satisfied by the RDF node kind of each value node. |
|
Specifies that the allowed language tags for each value node are limited by a given list of language tags. |
|
Specifies a condition to be satisfied with regards to the datatype of each value node. |
|
Specifies that each value node is a SHACL instance of a given type. |
|
Specifies the condition that each value node is a member of a provided SHACL list. |
|
Can be set to true to specify that no pair of value nodes may use the same language tag. |
|
Specifies the minimum inclusive value. The values of |
|
Specifies the maximum inclusive value. The values of |
|
Specifies the minimum exclusive value. The values of |
|
Specifies the maximum exclusive value. The values of |
|
A shape that has the value |
|
Specifies the condition that at least one value node is equal to the given RDF term. |
|
Sets the named graphs where SHACL shapes can be stored. Comma-delimited list. |
|
Can be used to state that at least one value node must be a member of a provided SHACL list. This constraint component only makes sense for property shapes. It takes a list argument similar to |
|
For use with DASH targets. |
|
Part of RDF4J’s SHACL extensions (RSX) and allows a shape to be the target for your constraints. For more information about the RSX features, see the RSX section. |
Implicit sh:targetClass
is supported for nodes that are rdfs:Class
and either of sh:PropertyShape
or sh:NodeShape
. Validation for all nodes that are equivalent to owl:Thing
in an environment with a reasoner can be enabled by setting setUndefinedTargetValidatesAllSubjects(true)
.
sh:or
is limited to statement based restrictions such as sh:datatype
, or aggregate based restrictions such as sh:minCount
, but not both at the same time.
Warning
The above description on sh:path is correct, when all sh:paths are supported, which will be implemented in later version.
Currently: sh:path
is limited to single predicate paths or a single inverse path. Sequence paths, alternative paths, and the like are not supported.