RDF Rank

What is RDF Rank

RDF Rank is an algorithm that identifies the more important or more popular entities in the repository by examining their interconnectedness. The popularity of entities can then be used to order the query results in a similar way to the internet search engines, the way Google orders search results using PageRank.

The RDF Rank component computes a numerical weighting for all nodes in the entire RDF graph stored in the repository, including URIs, blank nodes, literals, and RDF-star (formerly RDF*) embedded triples. The weights are floating point numbers with values between 0 and 1 that can be interpreted as a measure of a node’s relevance/popularity.

_images/RDF_rank.png

Since the values range from 0 to 1, the weights can be used for sorting a result set (the lexicographical order works fine even if the rank literals are interpreted as plain strings).

Here is an example SPARQL query that uses the RDF rank for sorting results by their popularity:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
PREFIX opencyc-en: <http://sw.opencyc.org/2008/06/10/concept/en/>
SELECT * WHERE {
  ?Person a opencyc-en:Entertainer .
  ?Person rank:hasRDFRank ?rank .
}
ORDER BY DESC(?rank) LIMIT 100

As seen in the example query, RDF Rank weights are made available via a special system predicate. GraphDB handles triple patterns with the predicate http://www.ontotext.com/owlim/RDFRank#hasRDFRank in a special way, where the object of the statement pattern is bound to a literal containing the RDF Rank of the subject.

rank#hasRDFRank returns the rank with precision of 0.01. You can as well retrieve the rank with precision of 0.001, 0.0001 and 0.00001 using respectively rank#hasRDFRank3, rank#hasRDFRank4, and rank#hasRDFRank5.

In order to use this mechanism, the RDF ranks for the whole repository must be computed in advance. This is done by committing a series of SPARQL updates that use special vocabulary to parameterize the weighting algorithm, followed by an update that triggers the computation itself.

Parameters

RDF Rank is fully controllable from Setup ‣ RDF Rank.

Parameter

Maximum iterations

Predicate

http://www.ontotext.com/owlim/RDFRank#maxIterations

Description

Sets the maximum number of iterations of the algorithm over all entities in the repository.

Default

20

Example

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { rank:maxIterations rank:setParam "16" . }

Parameter

Epsilon

Predicate

http://www.ontotext.com/owlim/RDFRank#epsilon

Description

Terminates the weighting algorithm early when the total change of all RDF Rank scores has fallen below this value.

Default

0.01

Example

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { rank:epsilon rank:setParam "0.05" . }

Full computation

To trigger the computation of the RDF Rank values for all resources, use the following update:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:compute _:b2. }

You can also compute the RDF Rank values in the background. This operation is asynchronous which means that the plugin manager will not be blocked during it and you can work with other plugins as the RDF Rank is being computed.

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:computeAsync _:b2. }

Warning

Using a SPARQL query to perform an asynchronous computation while in cluster will set your cluster out of sync. RDF Rank computations in a cluster should be performed synchronously.

Or, in the Workbench, go to Setup -> RDF Rank and click the Compute Full button.

_images/rankButtons.png

Note

When using the Workbench button on a standalone repository (not in a cluster), the RDF rank is computed asynchronously. When the button is used on a master repository (in a cluster), the rank is computed synchronously.

Incremental updates

The full computation of RDF Rank values for all resources can be relatively expensive. When new resources have been added to the repository after a previous full computation of the RDF Rank values, you can either have a full re-computation for all resources (see above) or compute only the RDF Rank values for the new resources (an incremental update).

The following control update:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA {_:b1 rank:computeIncremental "true"}

computes RDF Rank values for the resources that do not have an associated value, i.e., the ones that have been added to the repository since the last full RDF Rank computation.

Just like full computations, incremental updates can also be performed asynchronously:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA {_:b1 rank:computeIncrementalAsync "true"}

Warning

Using a SPARQL query to perform an asynchronous computation while in cluster will set your cluster out of sync. RDF Rank computations in a cluster should be performed synchronously.

Note

The incremental computation uses a different algorithm, which is lightweight (in order to be fast), but is not as accurate as the proper ranking algorithm. As a result, ranks assigned by the proper and the lightweight algorithms will be slightly different.

Exporting RDF Rank values

The computed weights can be exported to an external file using an update of this form:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:export "/home/user1/rdf_ranks.txt" . }

If the export fails, the update throws an exception and an error message is recorded in the log file.

Checking the RDF Rank status

The RDF Rank plugin can be in one of the following statuses:

/**
         * The ranks computation has been canceled
         */
        CANCELED,

        /**
         * The ranks are computed and up-to-date
         */
        COMPUTED,

        /**
         * A computing task is currently in progress
         */
        COMPUTING,

        /**
         * There are no calculated ranks
         */
        EMPTY,

        /**
         * Exception has been thrown during computation
         */
        ERROR,

        /**
         * The ranks are outdated and need computing
         */
        OUTDATED,

        /**
         * The filtering is enabled and its configuration has been changed since the last full computation
         */
        CONFIG_CHANGED

You can get the current status of the plugin by running the following query:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
SELECT ?o WHERE { ?s rank:status ?o }

Rank filtering

By default, the RDF Rank is calculated over the whole repository. This is useful when you want to find the most interconnected and important entities in general.

However, there are times when you are interested only in entities in certain graphs or entities related to a particular predicate. This is why the RDF Rank has a filtered mode - to filter the statements in the repository which are taken under account when calculating the rank.

You can enable the filtered mode with the following query:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { rank:filtering rank:setParam true }

The filtering of the statements can be performed based on predicate, graph, or type - explicit or implicit (inferred). You can make both inclusion and exclusion rules.

In order to include only statements having a particular predicate or being in a particular named graph, you should include the predicate / graph IRI in one of the following lists: includedPredicates / includedGraphs. Empty lists are treated as wildcards. See below how to control the lists with SPARQL queries:

Get the content of a list:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
SELECT ?s WHERE { ?s rank:includedPredicates ?o }

Add an IRI to a list:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { <http:predicate> rank:includedPredicates "add" }

Remove an IRI from a list:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { <http:predicate> rank:includedPredicates "remove" }

The filtering can be done not only by including statements of interest but by removing ones as well. In order to do so, there are two additional lists: excludedPredicates and excludedGraphs. These lists take precedence over their inclusion alternatives, so if for instance you have the same predicate in both inclusion and exclusion lists, it will be treated as excluded. These lists can be controlled in exactly the same way as the inclusion ones.

There is a convenient way to include/exclude all explicit/implicit statements. This is done with two parameters - includeExplicit and includeImplicit, which are set to true by default. When set to true, they are just disregarded, i.e., do not take part in the filtering. However, if you set them to false, they start acting as exclusion rules - this means they take precedence over the inclusion lists.

You can get the status of these parameters using:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
ASK { _:b1 rank:includeExplicit _:b2 . }

You can set value of the parameters with:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { rank:includeExplicit rank:setParam true }