SHACL validation

What is SHACL validation?

W3C standard Shapes Constraint Language (SHACL) validation is a valuable tool for efficient data consistency checking, and is supported by GraphDB via RDF4J’s ShaclSail . It is useful in efforts towards data integration, as well as examining data compliance, e.g., every GeoName URI must start with http://geonames.com/, or age must be above 18 years.

The language validates RDF graphs against a set of conditions. These conditions are provided as shapes and other constructs expressed in the form of an RDF graph. In SHACL, RDF graphs that are used in this manner are called shapes graphs, and the RDF graphs that are validated against a shapes graph are called data graphs.

A shape is an IRI or a blank node s that fulfills at least one of the following conditions in the shapes graph:

  • s is a SHACL instance of sh:NodeShape or sh:PropertyShape.

  • s is subject of a triple that has sh:targetClass, sh:targetNode, sh:targetObjectsOf or sh:targetSubjectsOf as predicate.

  • s is subject of a triple that has a parameter as predicate.

  • s is a value of a shape-expecting, non-list-taking parameter such as sh:node, or a member of a SHACL list that is a value of a shape-expecting and list-taking parameter such as sh:or.

Every SHACL repository contains the ShaclSail reserved graph http://rdf4j.org/schema/rdf4j#SHACLShapeGraph, where all the data is inserted.

Usage

Creating and configuring a SHACL repository

A repository with SHACL validation must be created from scratch, i.e., Create new. You cannot modify an already existing repository by enabling the validation afterwards.

Create a repository, enabling the Support SHACL validation option. Several additional checkboxes are opened:

  • Cache select nodes - The ShaclSail retrieves a lot of its relevant data through running SPARQL SELECT queries against the underlying Sail and against the changes in the transaction. This is usually good for performance, but it is recommended to disable this cache while validating large amounts of data as it will be less memory-consuming. Default value is true.

  • Log the executed validation plans - Logs (INFO) the executed validation plans as GraphViz DOT. It is recommended to disable Run parallel validation. Default value is false.

  • Run parallel validation - Runs validation in parallel. May cause deadlock, especially when using NativeStore. Default value is true.

  • Log the execution time per shape - Logs (INFO) the execution time per shape. IT is recommended to disable Run parallel validation and Cache select nodes. Default value is false.

  • Validate subjects when target is undefined - If no target is defined for a NodeShape, that NodeShape will be ignored. Enabling this will make such NodeShapes wildcard shapes and validate all subjects. Equivalent to setting sh:targetClass to owl:Thing or rdfs:Resource in an environment with a reasoner. Default value is false.

  • Log validation violations - Logs (INFO) a list of violations and the triples that caused the violations (BETA). It is recommended to disable Run parallel validation. Default value is false.

  • Log every execution step of the SHACL validation - Logs (INFO) every execution step of the SHACL validation. This is fairly costly and should not be used in production. It is recommended to disable Run parallel validation. Default value is false.

_images/enable_shacl.png

Some of these are used for logging and validation - you can find more about it further down in this page.

Loading shapes and data graphs

You can load shapes using all three key methods for loading data into GraphDB: through the Workbench, with an INSERT query in the SPARQL editor, and through the REST API.

Here is how to do it through the Workbench:

  1. Go to Import → RDF → User data → Import RDF text snippet, and insert the following shape:

    prefix ex: <http://example.com/ns#>
    prefix sh: <http://www.w3.org/ns/shacl#>
    prefix xsd: <http://www.w3.org/2001/XMLSchema#>
    
    ex:PersonShape
        a sh:NodeShape  ;
        sh:targetClass ex:Person ;
        sh:property [
            sh:path ex:age ;
            sh:datatype xsd:integer ;
    ] .
    

    It indicates that entities of the class Person have a property “age” of the type xsd:integer.

    Click Import. In the dialog that opens, select Target graphs → Named graph. Insert the ShaclSail reserved graph http://rdf4j.org/schema/rdf4j#SHACLShapeGraph as shown below:

    _images/shaclsail_reserved_graph.png
  2. After the shape has been imported, let’s test it with some data:

    1. Again from Import → RDF → User data → Import RDF text snippet, insert correct data (i.e., age is an integer):

      prefix ex: <http://example.com/ns#>
      prefix sh: <http://www.w3.org/ns/shacl#>
      prefix xsd: <http://www.w3.org/2001/XMLSchema#>
      
      ex:Alice
        rdf:type ex:Person ;
        ex:age 12 ;
      .
      

      Leave the Import settings as they are, and click Import. You will see that the data has been imported successfully, as it is compliant with the shape you just inserted.

    2. Now import incorrect data (i.e., age is a double):

      prefix ex: <http://example.com/ns#>
      prefix sh: <http://www.w3.org/ns/shacl#>
      prefix xsd: <http://www.w3.org/2001/XMLSchema#>
      
      
      ex:Alice
        rdf:type ex:Person ;
        ex:age 12.1 ;
      .
      

      The import will fail, returning a detailed error message with all validation violations in both the Workbench and the command line.

Validation logging and report

ShaclSail validates the data changes on commit(). In case of a violation, it will throw an exception that contains a validation report where you can find details about the noncompliance of your data. The exception will be shown in the Workbench if it was caused by an update executed in the same Workbench window.

In addition to that, you may also enable ShaclSail logging to get additional validation information in the log files. To enable logging, check one of the three logging options when creating the SHACL repository:

  • Log the executed validation plans

  • Log validation violations

  • Log every execution step of the SHACL validation

All three will log as INFO and appear in the main-[yyyy-mm-dd].log file in the logs directory of your GraphDB installation.

Supported SHACL features

The supported SHACL features are:

  • sh:targetClass - specifies a target class. Each value of sh:targetClass in a shape is an IRI.

  • sh:property - specifies that each value node has a given property shape.

  • sh:or - specifies the condition that each value node conforms to at least one of the provided shapes.

  • sh:minCount - specifies the minimum number of value nodes that satisfy the condition. If the minimum cardinality value is 0 then this constraint is always satisfied and so may be omitted.

  • sh:maxCount - specifies the maximum number of value nodes that satisfy the condition.

  • sh:minLength - specifies the minimum string length of each value node that satisfies the condition. This can be applied to any literals and IRIs, but not to blank nodes.

  • sh:maxLength - specifies the maximum string length of each value node that satisfies the condition. This can be applied to any literals and IRIs, but not to blank nodes.

  • sh:pattern - specifies a regular expression that each value node matches to satisfy the condition.

  • sh:flags - an optional string of flags, interpreted as in SPARQL 1.1 REGEX. The values of sh:flags in a shape are literals with datatype xsd:string.

  • sh:nodeKind - specifies a condition to be satisfied by the RDF node kind of each value node.

  • sh:languageIn - specifies that the allowed language tags for each value node are limited by a given list of language tags.

  • sh:datatype - specifies a condition to be satisfied with regards to the datatype of each value node.

  • sh:class - specifies that each value node is a SHACL instance of a given type.

Implicit sh:targetClass is supported for nodes that are rdfs:Class and either of sh:PropertyShape or sh:NodeShape. Validation for all nodes that are equivalent to owl:Thing in an environment with a reasoner can be enabled by setting setUndefinedTargetValidatesAllSubjects(true).

sh:path is limited to single predicate paths, e.g., ex:age. Sequence paths, alternative paths, inverse paths and the like are not supported.

sh:or is limited to statement based restrictions such as sh:datatype, or aggregate based restrictions such as sh:minCount, but not both at the same time.