SHACL Validation

What is SHACL validation?

W3C standard Shapes Constraint Language (SHACL) validation is a valuable tool for efficient data consistency checking, and is supported by GraphDB via RDF4J’s ShaclSail . It is useful in efforts towards data integration, as well as examining data compliance, e.g., every GeoName URI must start with http://geonames.com/, or age must be above 18 years.

The language validates RDF graphs against a set of conditions. These conditions are provided as shapes and other constructs expressed in the form of an RDF graph. In SHACL, RDF graphs that are used in this manner are called shapes graphs, and the RDF graphs that are validated against a shapes graph are called data graphs.

A shape is an IRI or a blank node s that fulfills at least one of the following conditions in the shapes graph:

  • s is a SHACL instance of sh:NodeShape or sh:PropertyShape.

  • s is subject of a triple that has sh:targetClass, sh:targetNode, sh:targetObjectsOf, or sh:targetSubjectsOf as predicate.

  • s is subject of a triple that has a parameter as predicate.

  • s is a value of a shape-expecting, non-list-taking parameter such as sh:node, or a member of a SHACL list that is a value of a shape-expecting and list-taking parameter such as sh:or.

Every SHACL repository contains the ShaclSail reserved graph http://rdf4j.org/schema/rdf4j#SHACLShapeGraph, where all the data is inserted.

Usage

Creating and configuring a SHACL repository

A repository with SHACL validation must be created from scratch, i.e., Create new. You cannot modify an already existing repository by enabling the validation afterwards.

Create a repository, enabling the Support SHACL validation option. Several additional checkboxes are opened:

  • Cache select nodes: The ShaclSail retrieves a lot of its relevant data through running SPARQL SELECT queries against the underlying Sail and against the changes in the transaction. This is usually good for performance, but it is recommended to disable this cache while validating large amounts of data as it will be less memory-consuming. Default value is true.

  • Log the executed validation plans: Logs (INFO) the executed validation plans as GraphViz DOT. It is recommended to disable Run parallel validation. Default value is false.

  • Run parallel validation: Runs validation in parallel. May cause deadlock, especially when using NativeStore. Default value is true.

  • Log the execution time per shape: Logs (INFO) the execution time per shape. It is recommended to disable Run parallel validation and Cache select nodes. Default value is false.

  • Validate subjects when target is undefined: If no target is defined for a NodeShape, that NodeShape will be ignored. Enabling this will make such NodeShapes wildcard shapes and validate all subjects. Equivalent to setting sh:targetClass to owl:Thing or rdfs:Resource in an environment with a reasoner. Default value is false.

  • Log validation violations: Logs (INFO) a list of violations and the triples that caused the violations (BETA). It is recommended to disable Run parallel validation. Default value is false.

  • Log every execution step of the SHACL validation - Logs (INFO) every execution step of the SHACL validation. This is fairly costly and should not be used in production. It is recommended to disable Run parallel validation. Default value is false.

  • RDF4J SHACL extensions: Activates RDF4J’s SHACL extensions (RSX) that provide additional functionality. RSX currently contains rsx:targetShape which will allow a Shape to be the target for your constraints. For more information about the RSX features, see the RSX section of RDF4J documentation.

  • DASH data shapes extensions: Activates DASH Data Shapes extensions. DASH Data Shapes Vocabulary is a collection of reusable extensions to SHACL for a wide range of use cases. Currently, this enables support for dash:hasValueIn, dash:AllObjectsTarget, and dash:AllSubjectsTargetIt.

_images/enable_shacl.png

Some of these are used for logging and validation - you can find more about it further down in this page.

Loading shapes and data graphs

You can load shapes using all three key methods for loading data into GraphDB: through the Workbench, with an INSERT query in the SPARQL editor, and through the REST API.

Here is how to do it through the Workbench:

  1. Go to Import ‣ RDF ‣ User data ‣ Import RDF text snippet, and insert the following shape:

    prefix ex: <http://example.com/ns#>
    prefix sh: <http://www.w3.org/ns/shacl#>
    prefix xsd: <http://www.w3.org/2001/XMLSchema#>
    
    ex:PersonShape
        a sh:NodeShape  ;
        sh:targetClass ex:Person ;
        sh:property [
            sh:path ex:age ;
            sh:datatype xsd:integer ;
    ] .
    

    It indicates that entities of the class Person have a property “age” of the type xsd:integer.

    Click Import. In the dialog that opens, select Target graphs ‣ Named graph. Insert the ShaclSail reserved graph http://rdf4j.org/schema/rdf4j#SHACLShapeGraph as shown below:

    _images/shaclsail_reserved_graph.png
  2. After the shape has been imported, let’s test it with some data:

    1. Again from Import ‣ RDF ‣ User data ‣ Import RDF text snippet, insert correct data (i.e., age is an integer):

      prefix ex: <http://example.com/ns#>
      prefix sh: <http://www.w3.org/ns/shacl#>
      prefix xsd: <http://www.w3.org/2001/XMLSchema#>
      
      ex:Alice
        rdf:type ex:Person ;
        ex:age 12 ;
      .
      

      Leave the Import settings as they are, and click Import. You will see that the data has been imported successfully, as it is compliant with the shape you just inserted.

    2. Now import incorrect data (i.e., age is a double):

      prefix ex: <http://example.com/ns#>
      prefix sh: <http://www.w3.org/ns/shacl#>
      prefix xsd: <http://www.w3.org/2001/XMLSchema#>
      
      
      ex:Alice
        rdf:type ex:Person ;
        ex:age 12.1 ;
      .
      

      The import will fail, returning a detailed error message with all validation violations in both the Workbench and the command line.

Deleting shapes and data graphs

There are two ways to delete a SHACL shape: from the GraphDB Workbench and with the RDF4J API.

From the Workbench

  1. Go to the SPARQL Editor in the Workbench.

  2. Clear the RDF4J graph for storing shapes by running the following update query:

    CLEAR GRAPH <http://rdf4j.org/schema/rdf4j#SHACLShapeGraph>
    

Note

Keep in mind that the Clean Repository option in the Explore ‣ Graphs overview tab would not delete the shape graph, as it removes all data from the repository, but not SHACL shapes.

With the RDF4J API

Use the following code snippet:

HTTPRepository repository = new HTTPRepository("http://address:port/", "repositoryname");
try (RepositoryConnection connection = repository.getConnection()) {
        connection.begin();
        connection.clear(RDF4J.SHACL_SHAPE_GRAPH);
    connection.commit();
}

Updating shapes and data graphs

To successfully update a shape graph, proceed as follows:

  1. Go to the SPARQL Editor in the Workbench.

  2. Clear the RDF4J graph for storing shapes by running the following update query:

    CLEAR GRAPH <http://rdf4j.org/schema/rdf4j#SHACLShapeGraph>
    
  3. Load the updated shape graph following the instructions in Loading shapes and data graphs.

Note

As shape graphs are stored separately from data, importing a new shape graph by enabling the Enable replacement of existing data box option in the Import settings dialog box would not work. This is why the above steps must be followed.

Viewing shapes and data graphs

Currently, shape graphs cannot be accessed with SPARQL inside GraphDB, as they are not part of the data. You can view the graph by using the RDF4J client to connect to the GraphDB repository. The following code snippet will return all statements inside the shape graph:

HTTPRepository repository = new HTTPRepository("http://address:port/", "repositoryname");
try (RepositoryConnection connection = repository.getConnection()) {
    Model statementsCollector = new LinkedHashModel(connection.getStatements(null, null, null, RDF4J.SHACL_SHAPE_GRAPH)
        .stream()
        .collect(Collectors.toList()));
}

Validation logging and report

ShaclSail validates the data changes on commit(). In case of a violation, it will throw an exception that contains a validation report where you can find details about the noncompliance of your data. The exception will be shown in the Workbench if it was caused by an update executed in the same Workbench window.

In addition to that, you may also enable ShaclSail logging to get additional validation information in the log files. To enable logging, check one of the three logging options when creating the SHACL repository:

  • Log the executed validation plans

  • Log validation violations

  • Log every execution step of the SHACL validation

All three will log as INFO and appear in the main-[yyyy-mm-dd].log file in the logs directory of your GraphDB installation.

Supported SHACL features

The supported SHACL features are:

Feature

Description

sh:targetClass

Specifies a target class. Each value of sh:targetClass in a shape is an IRI.

sh:targetNode

Specifies a node target. Each value of sh:targetNode in a shape is either an IRI or a literal.

sh:targetSubjectsOf

Specifies a subjects-of target in a shape. The values are IRIs.

sh:targetObjectsOf

Specifies an objects-of target in a shape. The values are IRIs.

sh:path

Points at the IRI of the property that is being restricted. Alternative, it may point at a path expression, which would allow you to constrain values that may be several “hops” away from the starting point.

sh:inversePath

An inverse path is a blank node that is the subject of exactly one triple in a graph. This triple has sh:inversePath as predicate, and the object is a well-formed SHACL property path.

sh:property

Specifies that each value node has a given property shape.

sh:or

Specifies the condition that each value node conforms to at least one of the provided shapes.

sh:and

Specifies the condition that each value node conforms to all provided shapes. This is comparable to conjunction and the logical “and” operator.

sh:not

Specifies the condition that each value node cannot conform to a given shape. This is comparable to negation and the logical “not” operator.

sh:minCount

Specifies the minimum number of value nodes that satisfy the condition. If the minimum cardinality value is 0 then this constraint is always satisfied and so may be omitted.

sh:maxCount

Specifies the maximum number of value nodes that satisfy the condition.

sh:minLength

Specifies the minimum string length of each value node that satisfies the condition. This can be applied to any literals and IRIs, but not to blank nodes.

sh:maxLength

Specifies the maximum string length of each value node that satisfies the condition. This can be applied to any literals and IRIs, but not to blank nodes.

sh:pattern

Specifies a regular expression that each value node matches to satisfy the condition.

sh:flags

An optional string of flags, interpreted as in SPARQL 1.1 REGEX. The values of sh:flags in a shape are literals with datatype xsd:string.

sh:nodeKind

Specifies a condition to be satisfied by the RDF node kind of each value node.

sh:languageIn

Specifies that the allowed language tags for each value node are limited by a given list of language tags.

sh:datatype

Specifies a condition to be satisfied with regards to the datatype of each value node.

sh:class

Specifies that each value node is a SHACL instance of a given type.

sh:in

Specifies the condition that each value node is a member of a provided SHACL list.

sh:uniqueLang

Can be set to true to specify that no pair of value nodes may use the same language tag.

sh:minInclusive

Specifies the minimum inclusive value. The values of sh:minInclusive in a shape are literals. A shape has at most one value for sh:minInclusive.

sh:maxInclusive

Specifies the maximum inclusive value. The values of sh:maxInclusive in a shape are literals. A shape has at most one value for sh:maxInclusive.

sh:minExclusive

Specifies the minimum exclusive value. The values of sh:minExclusive in a shape are literals. A shape has at most one value for sh:minExclusive.

sh:maxExclusive

Specifies the maximum exclusive value. The values of sh:maxExclusive in a shape are literals. A shape has at most one value for sh:maxExclusive.

sh:deactivated

A shape that has the value true for the property sh:deactivated is called deactivated. The value of sh:deactivated in a shape must be either true or false.

sh:hasValue

Specifies the condition that at least one value node is equal to the given RDF term.

dash:hasValueIn

Can be used to state that at least one value node must be a member of a provided SHACL list. This constraint component only makes sense for property shapes. It takes a list argument similar to sh:in but is “open” like sh:hasValue since it allows values outside of the list.

sh:target

For use with DASH targets.

rsx:targetShape

Part of RDF4J’s SHACL extensions (RSX) and allows a shape to be the target for your constraints. For more information about the RSX features, see the RSX section.

Implicit sh:targetClass is supported for nodes that are rdfs:Class and either of sh:PropertyShape or sh:NodeShape. Validation for all nodes that are equivalent to owl:Thing in an environment with a reasoner can be enabled by setting setUndefinedTargetValidatesAllSubjects(true).

sh:path is limited to single predicate paths, e.g., ex:age. Sequence paths, alternative paths, inverse paths and the like are not supported.

sh:or is limited to statement based restrictions such as sh:datatype, or aggregate based restrictions such as sh:minCount, but not both at the same time.