Integrating GraphDB with MongoDB¶
What’s in this document?
Introduction¶
MongoDB connector is a GraphDB plugin allowing users to query MongoDB databases using SPARQL
and to execute heterogeneous joins.
This section describes how to configure GraphDB and MongoDB to work together.
Why do you need MongoDB Integration?¶
MongoDB is a document-based database with the biggest developer/user community. It is part of the MEAN technology stack and guarantees scalability and performance well beyond the throughput supported in GraphDB. Often, we see use cases with extreme scalability requirements and simple data model (i.e. tree representation of a document and its meta-data).
What MongoDB Integration does?¶
MongoDB is a NoSQL
JSON
document store and does not natively support joins, SPARQL
or RDF
-enabled linked data.
The integration between GraphDB and MongoDB is done by a plugin which sends a request to MongoDB then transforms the result (which is expected to be a valid JSON-LD
document) to RDF
model.
Interfacing with MongoDB¶
The steps for using MongoDB with GraphDB are:
- Installing MongoDB;
- Preparing and loading
JSON-LD
documents in MongoDB; - Configuring GraphDB with MongoDB connection settings by creating an index.
Installing MongoDB¶
Setting up and maintaining a MongoDB database is a separate task and must be accomplished outside of GraphDB. See the MongoDB website for details.
Note
In the rest of this document, we assume you have the MongoDB server installed and running on a computer you can access.
Note
GraphDB integration plugin uses MongoDB Java driver version 3.8
. More information about the compatibility between MongoDB Java driver and MongoDB version is available at MongoDB website.
MongoDB documents¶
In order to be converted to RDF
Models, the documents in MongoDB should be valid JSON-LDs
.
The JSON-LD
documents are in hierarchical view allowing more complex search querying of embedded/nested documents.
Each document can be in separate context. That way, the relation between statements in GraphDB and documents in MongoDB is preserved when extracting parts of the documents and importing them in GraphDB, in order to make inferred statements. The import of parts is an option for future development.
Below is shown a sample document in MongoDB from the LDBC SPB benchmark
:
{
"_id": { "$oid": "5c0fb7f329298f15dc37bb81"},
"@graph":
[{
"@id": "http://www.bbc.co.uk/things/1#id",
"@type": "cwork:NewsItem",
"bbc:primaryContentOf":
[{
"@id": "bbcd:3#id",
"bbc:webDocumentType": {
"@id": "bbc:HighWeb"
}
},
{
"@id": "bbcd:4#id",
"bbc:webDocumentType": {
"@id": "bbc:Mobile"
}
}],
"cwork:about":
[{
"@id": "dbpedia:AccessAir"
},
{
"@id": "dbpedia:Battle_of_Bristoe_Station"
},
{
"@id": "dbpedia:Nicolas_Bricaire_de_la_Dixmerie"
},
{
"@id": "dbpedia:Bernard_Roberts"
},
{
"@id": "dbpedia:Bartolomé_de_Medina"
},
{
"@id": "dbpedia:Don_Bonker"
},
{
"@id": "dbpedia:Cornel_Nistorescu"
},
{
"@id": "dbpedia:Clete_Roberts"
},
{
"@id": "dbpedia:Mark_Palansky"
},
{
"@id": "dbpedia:Paul_Green_(taekwondo)"
},
{
"@id": "dbpedia:Mostafa_Abdel_Satar"
},
{
"@id": "dbpedia:Tommy_O'Connell_(hurler)"
},
{
"@id": "dbpedia:Ahmed_Ali_Salaad"
}],
"cwork:altText": "thumbnail atlText for CW http://www.bbc.co.uk/context/1#id",
"cwork:audience": {
"@id": "cwork:NationalAudience"
},
"cwork:category": {
"@id": "http://www.bbc.co.uk/category/Company"
},
"cwork:dateCreated": {
"@type": "xsd:dateTime",
"@value": "2011-02-15T07:13:29.495+02:00"
},
"cwork:dateModified": {
"@type": "xsd:dateTime",
"@value": "2012-02-14T12:43:13.165+02:00"
},
"cwork:description": " constipate meant breaking felt glitzier democrat's huskily breeding solicit gargling.",
"cwork:liveCoverage": {
"@type": "xsd:boolean",
"@value": "false"
},
"cwork:mentions": {
"@id": "geonames:2862704/"
},
"cwork:primaryFormat":
[{
"@id": "cwork:TextualFormat"
},
{
"@id": "cwork:InteractiveFormat"
}],
"cwork:shortTitle": " closest subsystem merit rebuking disengagement cerebrums caravans conduction disbelieved might.",
"cwork:thumbnail": {
"@id": "bbct:1361611547"
},
"cwork:title": "Beckhoff greatly agitators constructed racquets industry restrain spews pitifully undertone stultification."
}],
"@id": "bbcc:1#id",
"@context": {
"bbcevent": "http://www.bbc.co.uk/ontologies/event/",
"geo-pos": "http://www.w3.org/2003/01/geo/wgs84_pos#",
"bbc": "http://www.bbc.co.uk/ontologies/bbc/",
"time": "http://www.w3.org/2006/time#",
"event": "http://purl.org/NET/c4dm/event.owl#",
"music-ont": "http://purl.org/ontology/mo/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"foaf": "http://xmlns.com/foaf/0.1/",
"provenance": "http://www.bbc.co.uk/ontologies/provenance/",
"owl": "http://www.w3.org/2002/07/owl#",
"cms": "http://www.bbc.co.uk/ontologies/cms/",
"news": "http://www.bbc.co.uk/ontologies/news/",
"cnews": "http://www.bbc.co.uk/ontologies/news/cnews/",
"cconcepts": "http://www.bbc.co.uk/ontologies/coreconcepts/",
"dbp-prop": "http://dbpedia.org/property/",
"geonames": "http://sws.geonames.org/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"domain": "http://www.bbc.co.uk/ontologies/domain/",
"dbpedia": "http://dbpedia.org/resource/",
"geo-ont": "http://www.geonames.org/ontology#",
"bbc-pont": "http://purl.org/ontology/po/",
"tagging": "http://www.bbc.co.uk/ontologies/tagging/",
"sport": "http://www.bbc.co.uk/ontologies/sport/",
"skosCore": "http://www.w3.org/2004/02/skos/core#",
"dbp-ont": "http://dbpedia.org/ontology/",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"core": "http://www.bbc.co.uk/ontologies/coreconcepts/",
"curric": "http://www.bbc.co.uk/ontologies/curriculum/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"cwork": "http://www.bbc.co.uk/ontologies/creativework/",
"fb": "http://rdf.freebase.com/ns/",
"ot": "http://www.ontotext.com/",
"ldbcspb": "http://www.ldbcouncil.org/spb#",
"bbcd": "http://www.bbc.co.uk/document/",
"bbcc": "http://www.bbc.co.uk/context/",
"bbct": "http://www.bbc.co.uk/thumbnail/"
}
}
_id
key is a MongoDB internal key.@graph
node represents theRDF
context in theJSON-LD
doc.@type xsd:dateTime
date has a@date
key with aISODate(...)
value. This is not related to theJSON-LD
standard and is ignored when the document is parsed toRDF
Model. The dates are extended for faster search/sorting. TheISODate
in MongoDB is its internal way to store dates and is optimized for searching. This step will make querying/sorting by this date field easier but is optional.
Note
The keys in MongoDB can not contain “.”
nor start with “$”
. Although the JSON-LD
standard allows it, the MongoDB does not. Therefore, either use namespaces (see the sample above) or encoding the .
and $
respectively. Only the JSON
keys are subject to decoding.
Loading sample data¶
Import provided cwork1000.json
file with 1000
of createveWork
documents in MongoDB database “ldbc” and “creativeWorks” collection.
mongoimport --db ldbc --collection creativeWorks --file cwork1000.json
Creating an index¶
To configure GraphDB with MongoDB connection settings we need to set:
- The server where MongoDB is running;
- The port on which MongoDB is listening;
- The name of the database you are using;
- The name of the MongoDB collection you are using;
- The credetentials - (Optional unless you are using authentication) the username and password that will let you connect to the database.
Below is a sample query of how to create a MongoDB index:
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
INSERT DATA {
inst:spb1000 :service "mongodb://localhost:27017" ;
:database "ldbc" ;
:collection "creativeWorks" .
}
Supported predicates:
:service
- MongoDB connection string;:database
- MongoDB database;:collection
- MongoDB collection;:user
- (optional) MongoDB user for the connection;:password
- (optional) the user’s password;:authDb
- (optional) the database where the user is authenticated;
Deleting an index¶
Deletion of an index is done using the following query:
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
INSERT DATA {
inst:spb1000 :drop _:b .
}
Authentication¶
All types of authentication can be achieved by setting the credentials in the connection string. However, as it is not a good practice to store the passwords in plain text, the :user
, :password
and :authDb
predicates are introduced. If one of those predicates is used, it is mandatory to set the other two as well.
These predicates set credentials for SCRAM and LDAP authentication and the password is stored encrypted with a symmetrical algorithm on the disk.
For x.509 and Kerberos authentication the connection string should be used as no passwords are being stored.
Querying MongoDB¶
Below is a sample query which returns the dateModified for docs with the specific audience:
PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
SELECT ?creativeWork ?modified WHERE {
?search a inst:spb1000 ;
:find '{"@graph.cwork:audience.@id" : "cwork:NationalAudience"}' ;
:entity ?entity .
GRAPH inst:spb1000 {
?creativeWork cwork:dateModified ?modified .
}
}

In a query, use the exact values as in the docs. For example, if the full URIs are used instead of “cwork:NationalAudience”
or “@graph.cwork:audience.@id”
there wouldn’t be any matching results.
The :find
argument is a valid BSON
document.
Note
The results are returned in a named graph to indicate when the plugin should bind the variables. This is an API plugin limitation. The variables to be bound by the plugin are in a named graph. This allows GraphDB to determine whether to bind the specific variable using MongoDB or not.
Supported predicates:
:find
- accepts singleBSON
and sets a query string. The value is used to calldb.collection.find()
;:project
- accepts singleBSON
. The value is used to select the projection for the results returned by:find
. Find more info at MongoDB: Project Fields to Return from Query.:aggregate
- accepts an array of BSONs. Calls db.collection.aggregate(). This is the most flexible way to make a MongoDB query as thefind()
method is just a single phase of the aggregation pipeline. The:aggregate
predicate takes precedence over:find
and:project
. This means that if both:aggregate
and:find
are used:find
will be ignored.:graph
- accepts anIRI
. Specifies the IRI of the named graph in which the bound variables should be. Its default value is the the name of the index itself.:entity
- (REQUIRED) returns theIRI
of the MongoDB document. If theJSON-LD
has context, the value of@graph.@id
is used. In case of multiple values, the first one is chosen and a warning is logged. If theJSON-LD
has no context, the value of@id
node is used. Even if the value from this predicate is not used, it is required to have it in the query in order to inform the plugin that the graph part of the current iteration is completed.:hint
- specifies the index to be used when executing the query (callscursor.hint()
)
Multiple index calls in the same query¶
Multiple MongoDB calls are supported in the same query. There are two approaches:
- Each index call to be in a separate subselect (Example 1);
- Each index call to use different named graph. If querying different indexes, this comes out-of-the-box. If not, use the
:graph
predicate. (Example 2).
Example 1:
PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
SELECT ?creativeWork ?modified WHERE {
{
SELECT ?creativeWork ?modified {
?search a inst:spb1000 ;
:find '{"@graph.@id" : "http://www.bbc.co.uk/things/1#id"}' ;
:entity ?creativeWork .
GRAPH inst:spb1000 {
?creativeWork cwork:dateModified ?modified ;
}
}
}
UNION
{
SELECT ?creativeWork ?modified WHERE {
?search a inst:spb1000 ;
:find '{"@graph.@id" : "http://www.bbc.co.uk/things/2#id"}' ;
:entity ?entity .
GRAPH inst:spb1000 {
?creativeWork cwork:dateModified ?modified ;
}
}
}
}
Example 2:
PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
SELECT ?creativeWork ?modified WHERE {
{
?search a inst:spb1000 ;
:graph :search1 ;
:find '{"@graph.@id" : "http://www.bbc.co.uk/things/1#id"}' ;
:entity ?creativeWork .
GRAPH :search1 {
?creativeWork cwork:dateModified ?modified ;
}
}
UNION
{
?search a inst:spb1000 ;
:graph :search2 ;
:find '{"@graph.@id" : "http://www.bbc.co.uk/things/2#id"}' ;
:entity ?entity .
GRAPH :search2 {
?creativeWork cwork:dateModified ?modified ;
}
}
}
Both examples return the same result.

Using aggregation functions¶
MongoDB has a number of aggregation functions such as: min
, max
, size
, etc. These functions are called using the :aggregate
predicate. The data of the retrieved results has to be converted to RDF
model.
The example below shows how to retrieve the RDF
context of a MongoDB document.
PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?s ?o {
?search a inst:spb1000 ;
:aggregate '''[{"$match": {"@graph.@id": "http://www.bbc.co.uk/things/1#id"}},
{'$addFields': {'@graph.cwork:graph.@id' : '$@id'}}]''' ;
:entity ?entity .
GRAPH inst:spb1000 {
?s cwork:graph ?o .
}
}
The $addFields
phrase adds a new nested document in the JSON-LD
stored in MongoDB
The newly added document is then parsed to the following RDF
statement:
<http://www.bbc.co.uk/things/1#id> cwork:graph <http://www.bbc.co.uk/context/1#id>
We retrieve the context of the document using the cwork:graph
predicate.
This approach is really flexible but is prone to error.
Let’s examine the following query:
PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?g1 ?g2 {
?search a inst:spb1000 ;
:aggregate '''[{"$match": {"@graph.@id": "http://www.bbc.co.uk/things/1#id"}},
{'$addFields': {'@graph.inst:graph.@id' : '$@id'}}]''' ;
:entity ?entity .
GRAPH inst:spb1000 {
OPTIONAL {
?s inst:graph ?g1 .
}
?s <inst:graph> ?g2 .
}
}
It looks really similar to the first one except that instead of @graph.cwork:graph.@id
we are writing the value to @graph.inst:graph.@id
and as a result ?g1
will not get bound.
This happens because in the JSON-LD
stored in MongoDB we are aware of the cwork
context but not of the inst context. So ?g2
will get bound instead.
Custom fields¶
Example:
PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
SELECT ?size ?halfSize {
?search a inst:spb1000 ;
:aggregate '''[{"$match": {"@graph.@type": "cwork:NewsItem"}},
{"$count": "size"},
{"$project": {"custom.size": "$size", "custom.halfSize": {"$divide": ["$size", 2]}}}]''' ;
:entity ?entity .
GRAPH inst:spb1000 {
?s inst:size ?size ;
inst:halfSize ?halfSize .
}
}

The values are projected as child elements of a custom node. After JSON-LD
is taken from MongoDB, a pre-processing follows in order to retrieve all child elements of custom and create statements with predicates in the <http://www.ontotext.com/connectors/mongodb/instance#>
namespace.
Note
The returned values are always string literals.