SHACL Validation

What is SHACL validation?

Validation with the W3C standard Shapes Constraint Language (SHACL) standard is a valuable tool for efficient data consistency checking, and is supported by GraphDB via RDF4J’s ShaclSail storage and interface layer. It is useful in efforts towards data integration, as well as examining data compliance — for example, every GeoName URI must start with https://sws.geonames.org/\d+/, or that age values must be above 18 years.

The language validates RDF graphs against a set of conditions. These conditions are provided as shapes and other constructs expressed in the form of an RDF graph. In SHACL, RDF graphs that are used in this manner are called shapes graphs, and the RDF graphs that are validated against a shapes graph are called data graphs.

A shape is an IRI or a blank node s that fulfills at least one of the following conditions in the shapes graph:

  • s is a SHACL instance of sh:NodeShape or sh:PropertyShape.

  • s is subject of a triple that has sh:targetClass, sh:targetNode, sh:targetObjectsOf, or sh:targetSubjectsOf as predicate.

  • s is subject of a triple that has a parameter as predicate.

  • s is a value of a shape-expecting, non-list-taking parameter such as sh:node, or a member of a SHACL list that is a value of a shape-expecting and list-taking parameter such as sh:or.

Every SHACL repository contains the ShaclSail reserved graph http://rdf4j.org/schema/rdf4j#SHACLShapeGraph, where all the data is inserted.

It is also possible to specify your own custom graph via the sh:shapesGraph property - see how to do it below.

Usage

Creating and configuring a SHACL repository

A repository with SHACL validation must be created from scratch, i.e., Create new. You cannot modify an already existing repository by enabling the validation afterwards.

Create a repository and enable the Support SHACL validation option. Several additional checkboxes are opened:

  • Cache select nodes: The ShaclSail retrieves a lot of its relevant data through running SPARQL SELECT queries against the underlying Sail and against the changes in the transaction. This is usually good for performance, but it is recommended to disable this cache while validating large amounts of data as it will be less memory-consuming. Default value is true.

  • Log the executed validation plans: Logs (INFO) the executed validation plans as GraphViz DOT. It is recommended to disable Run parallel validation. Default value is false.

  • Run parallel validation: Runs validation in parallel. May cause deadlock, especially when using NativeStore. Default value is true.

  • Log the execution time per shape: Logs (INFO) the execution time per shape. It is recommended to disable Run parallel validation and Cache select nodes. Default value is false.

  • DASH data shapes extensions: Activates DASH Data Shapes extensions. DASH Data Shapes Vocabulary is a collection of reusable extensions to SHACL for a wide range of use cases. Currently, this enables support for dash:hasValueIn, dash:AllObjectsTarget, and dash:AllSubjectsTargetIt.

  • Log validation violations: Logs (INFO) a list of violations and the triples that caused the violations (BETA). It is recommended to disable Run parallel validation. Default value is false.

  • Log every execution step of the SHACL validation: Logs (INFO) every execution step of the SHACL validation. This is fairly costly and should not be used in production. It is recommended to disable Run parallel validation. Default value is false.

  • RDF4J SHACL extensions: Activates RDF4J’s SHACL extensions (RSX) that provide additional functionality. RSX currently contains rsx:targetShape which will allow a Shape to be the target for your constraints. For more information about the RSX features, see the RSX section of RDF4J documentation.

  • Named graphs for SHACL shapes: Sets the named graphs where SHACL shapes can be stored. Comma-delimited list. You can store shapes in one graph and untrusted data in another without the untrusted data overriding the constraints provided by the shapes.

    When using named graphs for SHACL shapes each data graph is evaluated separately by default. That is, if you have <http://example.org> sh:shapesGraph ex:shaclGraph and <http://example2.org> sh:shapesGraph ex:shaclGraph, the data in the two example graphs will not be joined before being validated. It is not possible to split a single object across multiple graphs in order to validate it with SHACL.

_images/enable_shacl.png

Some of these are used for logging and validation - you can find more about it further down in this page.

Loading shapes and data graphs

You can load shapes using all three key methods for loading data into GraphDB: through the Workbench, with an INSERT query in the SPARQL editor, and through the REST API.

Here is how to do it through the Workbench:

  1. Go to Import ‣ User data ‣ Import RDF text snippet, and insert the following shape:

    prefix ex: <http://example.com/ns#>
    prefix sh: <http://www.w3.org/ns/shacl#>
    prefix xsd: <http://www.w3.org/2001/XMLSchema#>
    
    ex:PersonShape
        a sh:NodeShape  ;
        sh:targetClass ex:Person ;
        sh:property [
            sh:path ex:age ;
            sh:datatype xsd:integer ;
    ] .
    

    It indicates that entities of the class Person have a property “age” of the type xsd:integer.

    Click Import. In the dialog that opens, select Target graphs ‣ Named graph. Insert the ShaclSail reserved graph http://rdf4j.org/schema/rdf4j#SHACLShapeGraph (or a custom named graph specified with the sh:shapesGraph property) as shown below:

    _images/shaclsail_reserved_graph.png

    Warning

    There are two reserved graphs when working with SHACL Shapes – the default (unnamed) graph and the RDF4J SHACL Shape Graph (http://rdf4j.org/schema/rdf4j#SHACLShapeGraph). If you put the SHACL shapes in either of those two graphs, they will always be used for validation when using the SHACL Validator API.

    If you want to specify a custom graph to keep the shapes in (for example <http://example.com>) you need to add the link between the data and shapes. In the example below, your data is in ex:dataGraph and the shapes graph is <http://example.com>:

    PREFIX ex: <http://example.com/ns#>
    PREFIX sh: <http://www.w3.org/ns/shacl#>
    INSERT DATA
    {
        GRAPH <http://example.com> {
            ex:dataGraph sh:shapesGraph <http://example.com>  .
        }
    }
    
  2. After the shape has been imported, let’s test it with some data:

    1. Again from Import ‣ User data ‣ Import RDF text snippet, insert correct data (i.e., age is an integer):

      prefix ex: <http://example.com/ns#>
      prefix xsd: <http://www.w3.org/2001/XMLSchema#>
      
      ex:Alice
        rdf:type ex:Person ;
        ex:age 12 ;
      .
      

      Leave the Import settings as they are, and click Import. You will see that the data has been imported successfully, as it is compliant with the shape you just inserted.

    2. Now import incorrect data (i.e., age is a double):

      prefix ex: <http://example.com/ns#>
      prefix xsd: <http://www.w3.org/2001/XMLSchema#>
      
      
      ex:Alice
        rdf:type ex:Person ;
        ex:age 12.1 ;
      .
      

      The import will fail, returning a detailed error message with all validation violations in both the Workbench and the command line.

Deleting shapes and data graphs

There are two ways to delete a SHACL shape: from the GraphDB Workbench and with the RDF4J API.

From the Workbench

  1. Go to the SPARQL Editor in the Workbench.

  2. Clear the RDF4J graph for storing shapes by running the following update query:

    CLEAR GRAPH <http://rdf4j.org/schema/rdf4j#SHACLShapeGraph>
    

Note

Keep in mind that the Clean Repository option in the Explore ‣ Graphs overview tab would not delete the shape graph, as it removes all data from the repository, but not SHACL shapes.

With the RDF4J API

Use the following code snippet:

HTTPRepository repository = new HTTPRepository("http://address:port/", "repositoryname");
try (RepositoryConnection connection = repository.getConnection()) {
        connection.begin();
        connection.clear(RDF4J.SHACL_SHAPE_GRAPH);
    connection.commit();
}

Bulk validation on an existing repository with external sources

You can also use the REST API to run SHACL validation on an existing repository regardless of the initial repository configuration by providing SHACL Shapes as a file, as a string of text, or with a pointer to shapes stored in an existing repository.

Bulk validation with a file

POST /rest/repositories/{repositoryID}/validate/file

Example code snippet (shapes.ttl refers to a file local to where curl is called):

curl -X POST --header 'Accept: text/turtle'\
    'http://localhost:7200/rest/repositories/dataRepo/validate/file'\
    -F 'file=@shapes.ttl;type=text/turtle'

Bulk validation with text

POST /rest/repositories/{repositoryID}/validate/text

Example code snippet:

curl -X POST --header 'Content-Type: text/turtle' --header 'Accept: text/turtle'\
    'http://localhost:7200/rest/repositories/dataRepo/validate/text'\
    --data-raw '@prefix ex: <http://example.com/ns#> .
                @prefix sh: <http://www.w3.org/ns/shacl#> .
                @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
                ex:PersonShape
                    a sh:NodeShape  ;
                    sh:targetClass ex:Person ;
                    sh:property [
                        sh:path ex:age ;
                        sh:datatype xsd:integer ;
                ] .'

Bulk validation with another repository

POST /rest/repositories/{dataRepositoryID}/validate/repository/{shapesRepositoryID}
curl -X POST --header 'Accept: text/turtle'\
    'http://localhost:7200/rest/repositories/dataRepo/validate/repository/shapesRepo'

For this example, import the following shape as an RDF text snippet in the dataRepo:

prefix ex: <http://example.com/ns#>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

ex:Alice
  rdf:type ex:Person ;
  ex:age 12.1 .

ex:Bob
  rdf:type ex:Person ;
  ex:age 12 .

Use the following shape for the shapesRepo against which the dataRepo will be validated:

prefix ex: <http://example.com/ns#>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

ex:PersonShape
    a sh:NodeShape  ;
    sh:targetClass ex:Person ;
    sh:property [
        sh:path ex:age ;
        sh:datatype xsd:integer
    ] .
Bulk validation with a named graph in another repository

When validating against another existing repository (referred to as shapes repository), the default graph and the SHACL default graph of that repository will be used to validate all named graphs in the repository you are validating. If you want to validate your data using named graphs from the shapes repository, you need to define the mapping between data graphs to validate from the data repository, and shapes graphs from the shapes repository to validate against.

Here is an example shape for the dataRepo for such a scenario:

prefix ex: <http://example.com/ns#>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

ex:doNotValidate {
  ex:Alice a ex:Person;
    ex:age 12.1 .
}

{
  ex:Bob a ex:Person;
    ex:age 12 .

  ex:Aerith a ex:Person;
    ex:age 12.1 .
}

ex:validate {
  ex:Steve a ex:Person;
    ex:age 12.5 .
}

Use the following shape for the shapesRepo against which the dataRepo will be validated:

prefix ex: <http://example.com/ns#>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

ex:shapesGraph {
  ex:PersonShape a sh:NodeShape;
    sh:targetClass ex:Person;
    sh:property _:node2 .

  _:node2 sh:path ex:age;
    sh:datatype xsd:integer .
}

{
  ex:validate sh:shapesGraph ex:shapesGraph .
}

In this example, the data graph ex:validate will be validated against the shapes graph ex:shapesGraph. The mapping is defined in the default graph of the shapes repository. A violation will be found as age can take only an integer as a value. However, neither the named graph ex:doNotValidate containing suspicious data, nor the default graph which contains both normal and suspicious data will be validated, as these are not mentioned in the mapping provided in the shapes repository.

Union of data graphs when validating the results

Warning

This feature is experimental and can hurt performance.

You can make a union of the data graphs when validating the results.

Here is an example, where:

  • ex:Alice rdf:type ex:Person. is in ex:dataGraph

  • ex:Alice ex:age 12.1 is in graph ex:dataGraph2

PREFIX ex: <http://example.com/ns#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
INSERT DATA {
    GRAPH ex:dataGraph{
        ex:Alice rdf:type ex:Person.
    }
    GRAPH ex:dataGraph2{
        ex:Alice ex:age 12.1
    }
}

This can be done by inserting the following shape in the shapes repository as TriG.

@prefix ex: <http://example.com/ns#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf4j: <http://rdf4j.org/schema/rdf4j#> .
@prefix rsx: <http://rdf4j.org/shacl-extensions#> .
<http://example.com> {
  ex:PersonShape a sh:NodeShape;
    sh:targetClass ex:Person;
    sh:property [
        sh:path ex:age ;
        sh:datatype xsd:integer ;
      ] .
    [
      a rsx:DataAndShapesGraphLink;
      rsx:shapesGraph <http://example.com>;
      rsx:dataGraph ex:dataGraph, ex:dataGraph2;
    ]
}

When you run the validation against the validation endpoint you will get both graphs in rsx:dataGraph /rest/repositories/{repositoryID}/validate/repository/{shapesRepositoryID}

@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix rsx: <http://rdf4j.org/shacl-extensions#> .
@prefix dash: <http://datashapes.org/dash#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf4j: <http://rdf4j.org/schema/rdf4j#> .

[] a sh:ValidationReport;
  sh:conforms false;
  rdf4j:truncated false;
  sh:result [ a sh:ValidationResult;
      sh:focusNode <http://example.com/ns#Alice>;
      rsx:dataGraph <http://example.com/ns#dataGraph>, <http://example.com/ns#dataGraph2>;
      rsx:shapesGraph <http://example.com>;
      sh:value 12.1;
      sh:resultPath <http://example.com/ns#age>;
      sh:sourceConstraintComponent sh:DatatypeConstraintComponent;
      sh:resultSeverity sh:Violation;
      sh:sourceShape [ a sh:PropertyShape;
          sh:path <http://example.com/ns#age>;
          sh:datatype xsd:integer
        ]
    ]

Updating shapes and data graphs

To successfully update a shape graph, proceed as follows:

  1. Go to the SPARQL Editor in the Workbench.

  2. Clear the RDF4J graph for storing shapes by running the following update query:

    CLEAR GRAPH <http://rdf4j.org/schema/rdf4j#SHACLShapeGraph>
    
  3. Load the updated shape graph following the instructions in Loading shapes and data graphs.

Note

As shape graphs are stored separately from data, importing a new shape graph by enabling the Enable replacement of existing data box option in the Import settings dialog box would not work. This is why the above steps must be followed.

Viewing shapes and data graphs

Currently, shape graphs cannot be accessed with SPARQL inside GraphDB, as they are not part of the data. You can view the graph by using the RDF4J client to connect to the GraphDB repository. The following code snippet will return all statements inside the shape graph:

HTTPRepository repository = new HTTPRepository("http://address:port/", "repositoryname");
try (RepositoryConnection connection = repository.getConnection()) {
    Model statementsCollector = new LinkedHashModel(connection.getStatements(null, null, null, RDF4J.SHACL_SHAPE_GRAPH)
        .stream()
        .collect(Collectors.toList()));
}

SPARQL capabilities in SHACL shapes

GraphDB supports SPARQL-based constraint components as well as SPARQL-based targets.

SHACL-SPARQL has to be written judiciously for performance reasons. As SPARQLConstraint is executed for each SHACL target, you want to make the targets as few as possible for performance reasons. In practice, this means that the actual constraints can be located within a sh:SPARQLTarget, and the sh:SPARQLConstraint is mostly used for populating the answers.

The following example has a SPARQLTarget which finds one violation, and a SPARQLConstraint check to clear up the message and value, for a total of two SPARQL requests. If using sh:targetClass instead of a SPARQL target, a SPARQL request would be executed for each instance of the schema:Person class, impacting performance severely on a dataset of realistic sizes.

@prefix dash:   <http://datashapes.org/dash#> .
@prefix schema: <http://schema.org/> .
@prefix sh:     <http://www.w3.org/ns/shacl#> .

schema:
  sh:declare [
    sh:prefix "schema" ;
    sh:namespace "http://schema.org/" ;
  ] .

schema:PersonShape
    a sh:NodeShape ;
    sh:target [
        a sh:SPARQLTarget;
        sh:prefixes schema: ;
        sh:select """
            select ?this {
                ?this a schema:Person ;
                      schema:age ?age .
                FILTER (?age < 18)}"""];
  sh:sparql [
    a sh:SPARQLConstraint;
    sh:prefixes schema: ;
    sh:message "This person is too young to vote!";
    sh:select """
      select $this ?value {
        $this schema:age ?value .
      }"""
    ].
PREFIX ex:     <http://example.org/ns#>
PREFIX schema: <http://schema.org/>
INSERT DATA {
    ex:Aerith
        a schema:Person ;
        schema:age 2 ;
        schema:eligibleToVote false .
    ex:Bob
        a schema:Person ;
        schema:age 21 ;
        schema:eligibleToVote true .
    ex:Alice
        a schema:Person ;
        schema:age 16 ;
        schema:eligibleToVote true .
}

Note

At the moment, sh:message does not populate values from sh:SPARQLConstraint.

Warning

  • sh:SPARQLConstraint requires both $this and ?value to be returned.

Validation logging and report

ShaclSail validates the data changes on commit(). In case of a violation, it will throw an exception that contains a validation report where you can find details about the noncompliance of your data. The exception will be shown in the Workbench if it was caused by an update executed in the same Workbench window.

In addition to that, you may also enable ShaclSail logging to get additional validation information in the log files. To enable logging, check one of the three logging options when creating the SHACL repository:

All three will log as INFO and appear in the main-[yyyy-mm-dd].log file in the logs directory of your GraphDB installation.

Supported SHACL features

The supported SHACL features are:

Feature

Description

sh:targetClass

Specifies a target class. Each value of sh:targetClass in a shape is an IRI.

sh:targetNode

Specifies a node target. Each value of sh:targetNode in a shape is either an IRI or a literal.

sh:targetSubjectsOf

Specifies a subjects-of target in a shape. The values are IRIs.

sh:targetObjectsOf

Specifies an objects-of target in a shape. The values are IRIs.

sh:path

Points at the IRI of the property that is being restricted. Alternative, it may point at a path expression, which would allow you to constrain values that may be several “hops” away from the starting point.

sh:inversePath

An inverse path is a blank node that is the subject of exactly one triple in a graph. This triple has sh:inversePath as predicate, and the object is a well-formed SHACL property path.

sh:property

Specifies that each value node has a given property shape.

sh:or

Specifies the condition that each value node conforms to at least one of the provided shapes.

sh:and

Specifies the condition that each value node conforms to all provided shapes. This is comparable to conjunction and the logical “and” operator.

sh:not

Specifies the condition that each value node cannot conform to a given shape. This is comparable to negation and the logical “not” operator.

sh:minCount

Specifies the minimum number of value nodes that satisfy the condition. If the minimum cardinality value is 0 then this constraint is always satisfied and so may be omitted.

sh:maxCount

Specifies the maximum number of value nodes that satisfy the condition.

sh:minLength

Specifies the minimum string length of each value node that satisfies the condition. This can be applied to any literals and IRIs, but not to blank nodes.

sh:maxLength

Specifies the maximum string length of each value node that satisfies the condition. This can be applied to any literals and IRIs, but not to blank nodes.

sh:pattern

Specifies a regular expression that each value node matches to satisfy the condition.

sh:flags

An optional string of flags, interpreted as in SPARQL 1.1 REGEX. The values of sh:flags in a shape are literals with datatype xsd:string.

sh:nodeKind

Specifies a condition to be satisfied by the RDF node kind of each value node.

sh:languageIn

Specifies that the allowed language tags for each value node are limited by a given list of language tags.

sh:datatype

Specifies a condition to be satisfied with regards to the datatype of each value node.

sh:class

Specifies that each value node is a SHACL instance of a given type.

sh:in

Specifies the condition that each value node is a member of a provided SHACL list.

sh:message

Specifies a human-readable message that can be associated with a shape. The sh:resultMessage property provides that message for a particular shape violation, and it is recommended to be based on the value of the sh:message property for a particular shape. At the moment, sh:message does not populate its content with variables from the report.

sh:severity

Specifies a severity value for the sh:severity property in the shapes graph. Each value of sh:severity in a shape is an IRI. SHACL describes three types of severity levels: sh:info, sh:warning, and sh:violation.

sh:uniqueLang

Can be set to true to specify that no pair of value nodes may use the same language tag.

sh:minInclusive

Specifies the minimum inclusive value. The values of sh:minInclusive in a shape are literals. A shape has at most one value for sh:minInclusive.

sh:maxInclusive

Specifies the maximum inclusive value. The values of sh:maxInclusive in a shape are literals. A shape has at most one value for sh:maxInclusive.

sh:minExclusive

Specifies the minimum exclusive value. The values of sh:minExclusive in a shape are literals. A shape has at most one value for sh:minExclusive.

sh:maxExclusive

Specifies the maximum exclusive value. The values of sh:maxExclusive in a shape are literals. A shape has at most one value for sh:maxExclusive.

sh:deactivated

A shape that has the value true for the property sh:deactivated is called deactivated. The value of sh:deactivated in a shape must be either true or false.

sh:hasValue

Specifies the condition that at least one value node is equal to the given RDF term.

sh:shapesGraph

Sets the named graphs where SHACL shapes can be stored. Comma-delimited list.

dash:hasValueIn

Can be used to state that at least one value node must be a member of a provided SHACL list. This constraint component only makes sense for property shapes. It takes a list argument similar to sh:in but is “open” like sh:hasValue since it allows values outside of the list.

sh:target

For use with DASH targets.

rsx:targetShape

Part of RDF4J’s SHACL extensions (RSX) and allows a shape to be the target for your constraints. For more information about the RSX features, see the RSX section.

Implicit sh:targetClass is supported for nodes that are rdfs:Class and either of sh:PropertyShape or sh:NodeShape. Validation for all nodes that are equivalent to owl:Thing in an environment with a reasoner can be enabled by setting setUndefinedTargetValidatesAllSubjects(true).

sh:or is limited to statement based restrictions such as sh:datatype, or aggregate based restrictions such as sh:minCount, but not both at the same time.

Warning

The above description on sh:path is correct, when all sh:paths are supported, which will be implemented in later version.

Currently: sh:path is limited to single predicate paths, single inverse path, sequence paths, and alternative paths. Support for sequence paths and alternative paths is implemented in RDF4J 4.3. The remaining paths (zero/one/more) are still not supported.