Vol-3194/paper10


Wolfgang Fahl

Paper

Paper
edit
description  
id  Vol-3194/paper10
wikidataid  Q117344920→Q117344920
title  Conceptually-grounded Mapping Patterns for Virtual Knowledge Graphs
pdfUrl  https://ceur-ws.org/Vol-3194/paper10.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/CalvaneseGLM0S22
volume  Vol-3194→Vol-3194
session  →

Paper[edit]

Paper
edit
description  
id  Vol-3194/paper10
wikidataid  Q117344920→Q117344920
title  Conceptually-grounded Mapping Patterns for Virtual Knowledge Graphs
pdfUrl  https://ceur-ws.org/Vol-3194/paper10.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/CalvaneseGLM0S22
volume  Vol-3194→Vol-3194
session  →

Conceptually-grounded Mapping Patterns for Virtual Knowledge Graphs[edit]

load PDF

Conceptually-grounded Mapping Patterns
for Virtual Knowledge Graphs
Diego Calvanese1,2 , Avigdor Gal3 , Davide Lanti1 , Marco Montali1 , Alessandro Mosca1
and Roee Shraga4
1
  Free-University of Bozen-Bolzano, Bolzano, Italy
2
  Umeå University, Umeå, Sweden
3
  Technion – Israel Institute of Technology, Haifa, Israel
4
  Khoury College of Computer Science, Northeastern University, Boston, Massachusetts


                                         Abstract
                                         Knowledge Graphs (KGs) have been gaining momentum recently in both academia and industry, due to
                                         the flexibility of their data model, allowing one to access and integrate collections of data of different
                                         forms. Virtual Knowledge Graphs (VKGs), a variant of KGs originating from the field of Ontology-based
                                         Data Access (OBDA), are a promising paradigm for integrating and accessing legacy data sources. The
                                         main idea of VKGs is that the KG remains virtual: the end-user interacts with a KG, but queries are
                                         reformulated on-the-fly as queries over the data source(s). To enable the paradigm, one needs to define
                                         declarative mappings specifying the link between the data sources and the elements in the VKG. In this
                                         work, we try to investigate common patterns that arise when specifying such mappings, building on
                                         well-established methodologies from the area of conceptual modeling and database design.

                                         Keywords
                                         Virtual Knowledge Graphs, Ontology-based Data Access, Mapping patterns, Data Integration




1. Introduction
Data integration and access to legacy data sources are key challenges for contemporary organi-
zations. In the whole spectrum of data integration and access solutions, the approach based on
Virtual Knowledge Graphs (VKGs) is gaining momentum, especially when the underlying data
sources to be integrated come in the form of relational databases (DBs) [1]. VKGs replace the
rigid structure of tables with the flexibility of a graph that incorporates domain knowledge and
is kept virtual, eliminating redundancies. A VKG specification consists of three main compo-
nents: (i) data sources (in the context of this paper, constituted by relational DBs), where the
actual data are stored; (ii) a domain ontology, capturing the relevant concepts, relations, and
constraints of the domain of interest; and (iii) a set of mappings, linking the data sources to the
ontology. A critical bottleneck in this setting lies in the definition and management of map-
pings. In this work, we focus on this issue by proposing a comprehensive catalog of mapping

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
" calvanese@inf.unibz.it (D. Calvanese); avigal@technion.ac.il (A. Gal); lanti@unibz.it (D. Lanti);
montali@unibz.it (M. Montali); mosca@unibz.it (A. Mosca); r.shraga@northeastern.edu (R. Shraga)
� 0000-0001-5174-9693 (D. Calvanese); 0000-0002-7028-661X (A. Gal); 0000-0003-1097-2965 (D. Lanti);
0000-0002-8021-3430 (M. Montali); 0000-0003-2323-3344 (A. Mosca); 0000-0001-8803-8481 (R. Shraga)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
�                                                                   Domain Knowledge




                                 Conceptual Model
         DB Schema                    (E-R)
                     DB Design                      OWL Encoding                      Alignment
                                                                      OWL 2 QL        Mappings
                                                                                                    OWL 2 QL
                                                                     DB Ontology
                                                                                                  Target Ontology
                                  VKG Mappings


Figure 1: The database and the ontology both stem from common domain knowledge.


patterns that emerge when linking data to ontologies. Our catalog is based on the (somehow
reasonable) assumption that both the ontology and the DB schema are derived from a conceptual
analysis of the domain of interest. The resulting knowledge may stay implicit, or may lead to an
explicit representation in the form of a structural conceptual model, which can be represented
using well-established notations such as UML, ORM, or E-R. On the one hand, this conceptual
model provides the basis for creating a corresponding domain ontology through a series of
semantic-preserving transformation steps. On the other hand, it can trigger the design process
that finally leads to the deployment of an actual DB. The whole view is depicted in Figure 1.
    Our catalog is built on well-established methodologies and patterns studied in data manage-
ment (e.g., W3C direct mappings (W3C-DM)1 and extensions), data analysis (e.g., algorithms
for discovering dependencies), and conceptual modeling (e.g., relational mapping techniques).
    The idea of mapping patterns is not new. For instance, work in [2] is closely related to ours,
as it also introduces a catalog of mapping patterns. However, there are some key differences
with our approach. One difference is that we consider KGs (with ontologies), whereas that work
focuses on property graphs without an ontology. More importantly, in [2] and in the related
literature, patterns are not formalized or grounded to a specific conceptual representation, but
are rather informally specified and discussed in a “by-example” fashion. On the contrary, each
of our patterns explicitly and non-ambiguously specifies the link between the conceptualization
and the DB instance, which is the one arising from applying well-known semantics-preserving
transformations studied in the area of DB design.
    We argue that this foundational grounding paves the way for a variety of VKG design scenarios,
depending on which information artifacts are available, and which ones must be produced. For
example, our patterns could be used to validate existing mappings, or to automatically generate
(i.e., bootstrap) ontology and mappings when only the DB is available. In fact, specific patterns
have been proposed also in relation to ontology and mapping bootstrapping, for which a variety of
tools and approaches have been developed in the last two decades [3, 4, 5, 6, 7]. The approaches in
the literature differ in terms of the overall purposes of bootstrapping (e.g., OBDA, data integration,
ontology learning, checking of DB schema constraints using ontology reasoning), the adopted on-
tology and mapping languages (e.g., OWL 2 profiles or RDFS as ontology languages, and R2RML
or custom languages for the specification of mappings), the different focus on direct and/or com-
plex mappings, and the assumed level of automation. The majority of the most recent approaches
    1
        http://www.w3.org/TR/rdb-direct-mapping/
�Table 1
Semantics of the DL-Liteℛ constructs that involve datatypes.

  Construct                     Syntax Element    Example      Semantics
  Top domain                    ⊤𝑉                             Δℐ𝑉
  Literal                       ℓ ∈ NL            “george”     ℓℐ ∈ Δℐ𝑉
  Datatype                      𝑇𝑖                xsd:int      𝑇𝑖ℐ ⊆ Δℐ𝑉
  Data property name            𝑑 ∈ ND            hasName      𝑑ℐ ⊆ Δℐ𝑂 × Δℐ𝑉
                                                                 𝑥 ∈ Δℐ𝑉 | ∃𝑣 ∈ Δℐ𝑉 : (𝑥, 𝑣) ∈ 𝑑ℐ
                                                               {︀                                  }︀
  Data property domain          𝛿(𝑑)              𝛿(hasName)
                                                                 𝑣 ∈ Δℐ𝑉 | ∃𝑜 ∈ Δℐ𝑂 : (𝑜, 𝑣) ∈ 𝑑ℐ
                                                               {︀                                 }︀
  Data property range           𝜌(𝑑)              𝜌(hasName)
  Data property negation        ¬𝑑                ¬hasName     Δ𝐼𝑂 × Δℐ𝑉 ∖ 𝑑ℐ


closely follow W3C-DM, deriving ontologies that mirror the structure of the input DB.
   The remainder of the paper is structured as follows: Section 2 introduces the notation and
basic notions on VKGs, Section 3 presents (an extract of) our catalog of mapping patterns, and
Section 4 concludes the paper.


2. Preliminaries
We use the bold font to denote tuples, e.g., x, y, are tuples. When convenient and non-
ambiguous, we treat tuples as sets and use set operators on them. We assume familiarity with
standard notions and languages from DBs [8], such as SQL or E-R diagrams.
   A VKG specification is a triple ⟨𝒯 , ℳ, 𝒮⟩ where 𝒯 is an ontology (or TBox), ℳ a set of
mappings, and 𝒮 the schema of a DB (with constraints, e.g., primary and foreign keys). In VKGs,
the ontology is formulated in OWL 2 QL 2 , but for conciseness we use its Description Logic (DL)
counterpart, DL-Liteℛ [9], here slightly enriched to handle datatypes.
   We fix the following enumerable, pairwise-disjoint sets: NI of individuals, NL of literal values,
NC of class names, NP of object property names, and ND of data property names.
   An OWL 2 QL TBox 𝒯 is a finite set of inclusion axioms of the form 𝐵 ⊑ 𝐶, 𝑞 ⊑ 𝑟, 𝜌(𝑑) ⊑ 𝑓 ,
or 𝑑 ⊑ 𝑣, where 𝐵, 𝐶 are classes, 𝑞, 𝑟 are object properties, 𝑑 is a data property, 𝑓 is a datatype
expression, 𝜌(𝑑) is a data property range expression, and 𝑣 is a data property expression. These
are defined according to the following grammar, where 𝐴 ∈ NC, 𝑑 ∈ ND, 𝑝 ∈ NP, 𝛿(𝑑) is a
data property domain expression, and 𝑇1 , . . . , 𝑇𝑛 are the RDF datatypes:

   𝐵 → 𝐴 | ∃𝑟 | 𝛿(𝑑)                          𝑞 → 𝑝 | 𝑝−             𝑓 → ⊤𝐷 | 𝑇1 | · · · | 𝑇𝑛
   𝐶 → ⊤𝐶 | 𝐵 | ¬𝐵                            𝑟 → 𝑞 | ¬𝑞             𝑣 → 𝑑 | ¬𝑑

In the rules above, ⊤𝐶 denotes the “top” element for concepts and ⊤𝐷 the one for data values
(called literals in the RDF terminology). An OWL 2 QL ABox 𝒜 is a finite set of assertions of the
form 𝐴(𝑎), 𝑝(𝑎, 𝑏), or 𝑑(𝑎, ℓ), where 𝐴 ∈ NC, 𝑝 ∈ NP, 𝑑 ∈ ND, 𝑎 and 𝑏 are individuals in NI,
and ℓ ∈ NL. We call the pair 𝒦 = ⟨𝒯 , 𝒜⟩ an OWL 2 QL Knowledge Graph (KG).

    2
        http://www.w3.org/TR/owl2-overview/
�   Similarly to first-order logic, the semantics of DL-Liteℛ KGs is given through Tarski-style
interpretations ℐ = ⟨Δℐ𝑂 , Δℐ𝑉 , ·ℐ ⟩, where Δℐ𝑂 is a non-empty domain of objects, Δℐ𝑣 is a non-
empty domain of values, and ·ℐ is an interpretation function. Table 1 reports the semantics for
the constructs involving datatypes. The other constructs are defined as in standard DL-Liteℛ [9].
As usual [10], we say that an interpretation ℐ satisfies a KG 𝒦, denoted by ℐ |= 𝒦, if ℐ satisfies
the ABox assertions and the inclusion axioms in 𝒦.

Mappings. Mappings specify how to populate classes and properties of the ontology with
individuals and values constructed from the data in the underlying DB. In other words, mappings
provide the ABox that, together with a given TBox, realizes a KG. In VKGs, the adopted
language for mappings in real-world systems is R2RML3 , but for conciseness we use here a more
convenient abstract notation inspired by the literature [11]: a mapping 𝑚 is a pair of the form
⟨𝑠: 𝑄(x), 𝑡: L(t(x))⟩, where 𝑄(x) is a SQL query with answer variables x over the DB schema 𝒮,
called source query, and L(t(x)) is a list of target atoms of the form 𝐶(t1 (x1 )), 𝑝(t1 (x1 ), t2 (x2 )),
or 𝑑(t1 (x1 ), t2 (x2 )), where 𝐶 ∈ NC, 𝑝 ∈ NP, 𝑑 ∈ ND, and t1 (x1 ) and t2 (x2 ) are terms that
we call templates. We express source queries in relational algebra, omitting answer variables
under the assumption that they coincide with the variables used in the target atoms.
   Intuitively, a template t(x) in the target atom of a mapping corresponds to an R2RML string
template4 , and is used to generate an IRI (hence, an object identifier) or an RDF literal, starting
from DB values retrieved by the source query in that mapping. For the examples, we use the
concrete syntax from the Ontop VKG system [6], in which the source query is expressed in SQL
and each target atom is expressed as an RDF triple pattern with templates. The answer variables
of the source query occurring in the target atoms are distinguished by enclosing them in curly
brackets { · · · }. The following is an example mapping expressed in such syntax:
source     SELECT ssn FROM person
target     ex:pers/{ssn} a ex:Person .

In the mapping above, the string ex: denotes a URI prefix, e.g., ex:Person is an abbreviation for
the URI http://www.example.com/Person. Such mapping, when applied to a DB instance 𝒟 of
𝒮, populates the class ex:Person with IRIs constructed by replacing the answer variable ssn
occurring in the target atom with the corresponding values assigned to that variable by the
answers to the SQL source query evaluated over 𝒟. For instance, if the source query returns
two answers that assign to the answer variable ssn respectively the values 1234 and 5678, then
the mapping above produces the following RDF graph (expressed in the Turtle syntax5 ), stating
that individuals ex:pers/1234 and ex:pers/5678 are both instances of class ex:Person:
    ex:pers/1234 a ex:Person .                                  ex:pers/5678 a ex:Person .

   We denote by 𝒜ℳ(𝒟) the virtual ABox constructed through mappings ℳ from a DB 𝒟.
Given a VKG specification ⟨𝒯 , ℳ, 𝒮⟩ and a database instance 𝒟 of 𝒮, the KG 𝒦 = ⟨𝒯 , 𝒜ℳ(𝒟) ⟩
is called the Virtual Knowledge Graph of ⟨𝒯 , ℳ, 𝒮⟩ through 𝒟. The qualifier “virtual” in the
    3
      http://www.w3.org/TR/r2rml/
    4
      https://www.w3.org/TR/r2rml/#dfn-string-template
    5
      http://www.w3.org/TR/turtle/
�name derives from the fact that the virtual ABox 𝒜ℳ(𝒟) in a VKG setting is not materialized
and stored somewhere. Query answering in VKGs, in fact, is carried out through query rewriting
and query unfolding techniques [11, 6]: user queries, expressed in SPARQL 6 , get translated on-
the-fly into equivalent SQL queries, which then are directly evaluated against the DB.


3. Mapping Patterns
In its basic form, a mapping pattern is a quadruple ⟨𝒞, 𝒮, ℳ, 𝒯 ⟩, where 𝒞 is a conceptual
model, 𝒮 a database schema, ℳ a set of mappings, and 𝒯 an (OWL 2 QL) ontology. In such
pattern, the pair ⟨𝒞, 𝒮⟩ puts into correspondence a conceptual representation with one of its
(many) admissible (i.e., formally sound [12, 13]) database schemata, like those prescribed by
well-established database modeling methodologies. The pair ⟨ℳ, 𝒯 ⟩, instead, is formed by the
DB ontology 𝒯 , which is the OWL 2 QL encoding7 of the conceptual model 𝒞, and the set ℳ of
mappings, providing the link between 𝒮 and 𝒯 . The term “DB ontology” refers to an ontology
whose concepts and properties reflect the constructs of the conceptual model, mirroring the
structure of the relational database, as displayed in Figure 1.
   Some of the more advanced patterns have a more complex structure, where pairs of conceptual
models and/or pairs of database schemata are used in place of 𝒞 and 𝒮, respectively (e.g., the
pattern “SHa” falls in this category). These patterns prescribe specific transformations to be
applied to an input conceptual (resp., DB) schema, in order to obtain an output conceptual
(resp., DB) schema. These output artifacts make explicit the presence of specific structures that
are revealed through the application of the pattern itself. These structures can in turn enable
further applications of patterns.

Presentation Conventions. We show the fragment of the conceptual model that is affected
by the pattern in E-R notation (adopting the original notation by Chen [14]). To compactly
represent sets of attributes, we use a small diamond in place of the small circle used for single
attributes in Chen’s notation. For cardinality constraints we follow the “look-here” convention,
that is, the cardinality constraint for a role is placed next to the entity participating in that role.
In the DB schema, we use 𝑇 (K, A) to denote a table with name 𝑇 , primary key consisting of
the attributes K, and additional attributes A. Given a set U of attributes in 𝑇 , we denote by
key𝑇 (U) the fact that U form a key for 𝑇 . Referential integrity constraints (like, e.g., foreign
keys) are denoted with arcs, pointing from the referencing attribute(s) to the referenced one(s).
For conciseness, we denote sets of the form {𝑜 | condition} as {𝑜}condition . In order to express
datatypes for data properties, we introduce two auxiliary functions: a function 𝜏 that, given
a DB attribute 𝐴, returns the DB datatype of 𝐴, and a function 𝜇 that associates, to each DB
datatype, a corresponding RDF datatype. For the definition of 𝜇, we re-use the Natural Mapping 8
correspondence provided by the R2RML recommendation. As a final note, following the E-R-
diagrams convention, we assume a default (1, 1) cardinality on attributes. For such a reason, in
the DB schema we assume all attributes to be not nullable by default (using the SQL convention,

    6
      http://www.w3.org/TR/sparql11-query
    7
      Modulo the expressivity of the OWL 2 QL language.
    8
      https://www.w3.org/TR/r2rml/#natural-mapping
�Table 2
An extract of our catalog of mapping patterns.
 Conceptual Model                            DB Schema                     Mappings                             Ontology
                                                                      Schema Entity (SE)                         ⎧                    ⎫
          K A
                                                                           𝑠: 𝑇𝐸                                 ⎨ 𝛿(𝑑𝐴 ) ⊑ 𝐶𝐸 ,      ⎬
                                              𝑇𝐸 (K, A)                    𝑡: 𝐶𝐸 (t𝐸 (K)),                         𝜌(𝑑𝐴 ) ⊑ 𝜇(𝜏 (𝐴)),
           E                                                                                                     ⎩ 𝐶 ⊑ 𝛿(𝑑 )
                                                                              {𝑑𝐴 (t𝐸 (K), 𝐴)}𝐴∈K∪A
                                                                                                                                      ⎭
                                                                                                                     𝐸       𝐴          𝐴∈K∪A
 In case of optional attributes, for each optional attribute 𝐴′ of 𝐸, add an opt(𝐴′ ) constraint to the DB schema and drop the corresponding
 inclusion axiom 𝐶𝐸 ⊑ 𝛿(𝑑𝐴′ ) from the ontology.
                                                                 Schema Relationship (SR)
 KE AE                     KF AF     𝑇𝐸 (K𝐸 , A𝐸 ) 𝑇𝐹 (K𝐹 , A𝐹 )            𝑠: 𝑇𝑅                                ∃𝑝𝑅 ⊑ 𝐶𝐸
   E       R                F        𝑇𝑅 (K𝑅𝐸 , K𝑅𝐹 )                        𝑡: 𝑝𝑅 (t𝐶𝐸 (K𝑅𝐸 ), t𝐶𝐹 (K𝑅𝐹 ))       ∃𝑝−
                                                                                                                   𝑅 ⊑ 𝐶𝐹

 • In case of cardinality (_, 1) on role 𝑅𝐸 (resp., 𝑅𝐹 ), the primary key of 𝑇𝑅 is restricted to the attributes K𝑅𝐸 (resp., K𝑅𝐹 ). In case both roles
   have cardinality (_, 1), either choice for the primary key is made, and the remaining attributes form a non-primary key in the logical schema.
 • In case of cardinality (1, _) on role 𝑅𝐸 (resp., 𝑅𝐹 ), the inclusion dependency K𝐸 ⊆ K𝑅𝐸 (resp., K𝐹 ⊆ K𝑅𝐹 ) holds in the schema, and the
   first (resp., second) inclusion axiom in the ontology holds in both directions. Note that when the maximum cardinality on role 𝑅𝐸 (resp.,
   𝑅𝐹 ) is 1, the corresponding inclusion dependency is actually a foreign key.
                                                Schema Relationship with Identifier Alignment (SRa)
 KE AE                     KF UF
                                   𝑇𝐸 (K𝐸 , A𝐸 ) 𝑇𝐹 (K𝐹 , U𝐹 , A𝐹 )         𝑠: 𝑇𝑅 ⋊⋉U𝑅𝐹 =U𝐹 𝑇𝐹                   ∃𝑝𝑅 ⊑ 𝐶𝐸
   E       R                F
                                   𝑇𝑅 (K𝑅𝐸 , U𝑅𝐹 ) key𝑇𝐹 (U𝐹 )              𝑡: 𝑝𝑅 (t𝐶𝐸 (K𝑅𝐸 ), t𝐶𝐹 (K𝐹 ))        ∃𝑝−
                                                                                                                   𝑅 ⊑ 𝐶𝐹
                            AF

                                                  Schema Hierarchy with Identifier Alignment (SHa)
         KE AE


          E                          𝑇𝐸 (K𝐸 , A𝐸 ) key𝑇𝐹 (K𝐹 𝐸 )
                 KF                                                                                              𝐶𝐹 ⊑ 𝐶𝐸
                                     𝑇𝐹 (K𝐹 , K𝐹 𝐸 , A𝐹 )                   𝑠: 𝑉𝐹                                ⎧                    ⎫
          F           AF                                                                                         ⎨ 𝛿(𝑑𝐴 ) ⊑ 𝐶𝐹 ,
                                                                            𝑡: 𝐶𝐹 (t𝐶𝐸 (K𝐹 𝐸 )),
                                                                                                                                      ⎬
         KE AE                                                                                                     𝜌(𝑑𝐴 ) ⊑ 𝜇(𝜏 (𝐴)),
                                     𝑇𝐸 (K𝐸 , A𝐸 ) key𝑉𝐹 (K𝐹 )                 {𝑑𝐴 (t𝐶𝐸 (K𝐹 𝐸 ), 𝐴)}𝐴∈K𝐹 ∪A𝐹     ⎩ 𝐶 ⊑ 𝛿(𝑑 )          ⎭
                                                                                                                     𝐹       𝐴          𝐴∈K ∪A
          E                                                                                                                                     𝐹   𝐹
                                     𝑉𝐹 (K𝐹 , K𝐹 𝐸 , A𝐹 ) = 𝑇𝐹
                 KF
          F           AF

 In this pattern, the “alignment” is meant to align the primary identifier used in the child entity to the primary identifier used in the parent
 entity. The other two possiblities for applying the pattern are:
 • the foreign key in the child entity is the primary key of that entity, and references a non-primary key of the parent entity;
 • the foreign key in the child entity is a non-primary key of that entity, and references a non-primary key of the parent entity.
 We depict here the most common scenario, where the foreign key points to the primary key of the parent entity.
 Observe that this pattern requires a change in the conceptual model (essentially keeping track of the attributes used for identifying the objects
 of the subclass).



declared as “NOT NULL”). An optional attribute 𝐴 is instead denoted by adding opt(𝐴) to the
DB schema. Such notation extends in the natural way to a set A of attributes.

Pattern Catalog.                   Table 2 shows an excerpt of our patterns, which we discuss in detail here.
Schema Entity (SE). This fundamental pattern describes the correspondence between an entity
with a primary identifier and attributes in the DB schema, and a class and data properties in the
ontology. The entity is expressed in the DB schema through a single table 𝑇𝐸 with primary key
K and other attributes A, as it is the norm in sound DB design practices. The mappings column
explains how 𝑇𝐸 is mapped into a corresponding class 𝐶𝐸 . The primary key of 𝑇𝐸 is employed
to construct the IRIs of the objects that are instances of 𝐶𝐸 , using a template t𝐸 specific for
that entity. Each relevant attribute of 𝑇𝐸 is mapped to a data property of 𝐶𝐸 , with suitable
domain and range axioms. A mandatory participation constraint is added to each data property
corresponding to a mandatory attribute.
   Example: A client registry table containing SSNs of clients, together with their name as
an additional attribute, is mapped to a Client class using the SSN to construct its objects. In
�addition, the SSN and name are mapped to two corresponding data properties.
Schema Relationship (SR). This pattern describes the correspondence between a binary relation-
ship without attributes and an OWL 2 QL object property, for the case where such relationship
is represented in the DB as a separate (usually, “many-to-many”) table. This pattern considers
three tables 𝑇𝑅 , 𝑇𝐸 , and 𝑇𝐹 , for which the set of columns in 𝑇𝑅 is partitioned into two parts
KRE and KRF that are foreign keys to 𝑇𝐸 and 𝑇𝐹 , respectively. The identifier of 𝑇𝑅 depends
on the role cardinalities in the E-R model. The pattern captures how 𝑇𝑅 is mapped to an object
property 𝑝𝑅 , using the two parts KRE and KRF of the partition to construct respectively the
subject and the object of the triples in 𝑝𝑅 . The templates t𝐶𝐸 and t𝐶𝐹 must be those respectively
used for building instances of classes 𝐶𝐸 corresponding to 𝑇𝐸 and 𝐶𝐹 corresponding to 𝑇𝐹 .
   Example: An additional table in the client registry stores the addresses of each client, and
has a foreign key to a table with locations. The former table is mapped to an address object
property, for which the ontology asserts that the domain is the class Person and the range an
additional class Location, which corresponds to the latter table.
Schema Relationship with Identifier Alignment (SRa). This pattern is similar to pattern SR, but it
comes with a modifier a, indicating that the pattern can be applied after the identifiers involved
in the relationship have been aligned. The alignment is necessary because the foreign key in 𝑇𝑅
does not refer to the primary key K𝐹 of 𝑇𝐹 , but to an alternative key U𝐹 . Since the instances of
the class 𝐶𝐹 corresponding to 𝑇𝐹 are constructed using the primary key K𝐹 of 𝑇𝐹 (cf. pattern
SE), also the pairs that populate 𝑝𝑅 should refer in their object position to that primary key,
which can only be retrieved via a join between 𝑇𝑅 and 𝑇𝐹 on the key U𝐹 .
   Example: The primary key of the table with locations is not given by the city and street,
which are used in the table that relates clients to their addresses, but is given by the latitude
and longitude of locations.
Schema Hierarchy with Identifier Alignment (SHa). This patterns handles the case where a
hierarchy is specified and the child entity uses a primary identifier different from the one in the
parent entity. In this situation, the foreign-key constraint can come in three different variants.
In the depicted one, the foreign key in 𝑇𝐹 is over a non-primary key KFE . The objects for 𝐶𝐹
have to be built out of KFE , rather than out of the primary key of 𝑇𝐹 . For this purpose, the
pattern creates a view 𝑉𝐹 identical to 𝑇𝐹 , except that KFE is the primary key. Also the foreign
key relations are preserved. Such view might enable further applications of patterns.
   Example: An ISA relation between entities Student and Person. Students are identified by
their matriculation number, whereas persons are identified by their SSN.


4. Conclusions and Future Work
In this work, we have identified and formally specified a number of mapping patterns emerging
when linking DBs to ontologies in a typical VKG setting. Our patterns are grounded in well-
established practices of DB design, and render explicit the connection between the conceptual
model, the DB schema, and the ontology. We envision that the organization in patterns can
enable a number of relevant tasks, notably mapping bootstrapping for incomplete VKGs.
�   This work is only a first step, with respect to both categorization of patterns, and their actual
use. Regarding the former, we are currently extending this initial catalog with more advanced
“data-driven” patterns, which are patterns where the data component needs to be taken into
account. Regarding the latter, we are investigating solutions to specific problems that need to
be addressed when setting-up a VKG scenario, like the problem of mapping bootstrapping.


Acknowledgments
This research has been partially supported by the Wallenberg AI, Autonomous Systems and
Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, by the Italian
Basic Research (PRIN) project HOPE, by the EU H2020 project INODE (grant agreement 863410),
and by the project MENS, funded through the 4th Call for Research of the Autonomous Province
of Bolzano (IN2219).


References
 [1] G. Xiao, L. Ding, B. Cogrel, D. Calvanese, Virtual Knowledge Graphs: An overview of
     systems and use cases, Data Intelligence 1 (2019) 201–223.
 [2] J. Sequeda, O. Lassila, Designing and Building Enterprise Knowledge Graphs, Morgan &
     Claypool Publishers, 2021.
 [3] L. F. de Medeiros, F. Priyatna, Ó. Corcho, MIRROR: Automatic R2RML mapping generation
     from relational databases, in: Proc. ICWE, volume 9114 of LNCS, Springer, 2015, pp. 326–
     343.
 [4] E. Jiménez-Ruiz, E. Kharlamov, D. Zheleznyakov, I. Horrocks, C. Pinkel, M. G. Skjæveland,
     E. Thorstensen, J. Mora, BootOX: Practical mapping of RDBs to OWL 2, in: Proc. ISWC,
     volume 9367 of LNCS, Springer, 2015, pp. 113–132.
 [5] C. Pinkel, C. Binnig, E. Kharlamov, P. Haase, IncMap: Pay as you go matching of relational
     schemata to OWL ontologies., in: Proc. 8th Int. Workshop on Ontology Matching (OM),
     volume 1111 of CEUR, CEUR-WS.org, 2013, pp. 37–48.
 [6] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez-
     Muro, G. Xiao, Ontop: Answering SPARQL queries over relational databases, Semantic
     Web J. 8 (2017) 471–487.
 [7] J. F. Sequeda, D. P. Miranker, Ultrawrap Mapper: A semi-automatic relational database to
     RDF (RDB2RDF) mapping tool, in: Proc. ISWC Posters & Demonstrations Track, volume
     1486 of CEUR, CEUR-WS.org, 2015.
 [8] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison Wesley, 1995.
 [9] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and
     efficient query answering in description logics: The DL-Lite family, JAR 39 (2007) 385–429.
[10] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The De-
     scription Logic Handbook: Theory, Implementation and Applications, 2nd ed., Cambridge
     University Press, 2007.
[11] A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, R. Rosati, Linking data to
     ontologies, J. on Data Semantics 10 (2008) 133–173.
�[12] R. Hull, Relative information capacity of simple relational database schemas, SIAM J. on
     Computing 15 (1986) 856–886.
[13] R. J. Miller, Y. E. Ioannidis, R. Ramakrishnan, Schema equivalence in heterogeneous
     systems: Bridging theory and practice, Information Systems 19 (1994) 3–31.
[14] P. P. Chen, The Entity-Relationship model: Toward a unified view of data, ACM TODS 1
     (1976) 9–36.
�

Conceptually-grounded Mapping Patterns for Virtual Knowledge Graphs[edit]

load PDF

Conceptually-grounded Mapping Patterns
for Virtual Knowledge Graphs
Diego Calvanese1,2 , Avigdor Gal3 , Davide Lanti1 , Marco Montali1 , Alessandro Mosca1
and Roee Shraga4
1
  Free-University of Bozen-Bolzano, Bolzano, Italy
2
  Umeå University, Umeå, Sweden
3
  Technion – Israel Institute of Technology, Haifa, Israel
4
  Khoury College of Computer Science, Northeastern University, Boston, Massachusetts


                                         Abstract
                                         Knowledge Graphs (KGs) have been gaining momentum recently in both academia and industry, due to
                                         the flexibility of their data model, allowing one to access and integrate collections of data of different
                                         forms. Virtual Knowledge Graphs (VKGs), a variant of KGs originating from the field of Ontology-based
                                         Data Access (OBDA), are a promising paradigm for integrating and accessing legacy data sources. The
                                         main idea of VKGs is that the KG remains virtual: the end-user interacts with a KG, but queries are
                                         reformulated on-the-fly as queries over the data source(s). To enable the paradigm, one needs to define
                                         declarative mappings specifying the link between the data sources and the elements in the VKG. In this
                                         work, we try to investigate common patterns that arise when specifying such mappings, building on
                                         well-established methodologies from the area of conceptual modeling and database design.

                                         Keywords
                                         Virtual Knowledge Graphs, Ontology-based Data Access, Mapping patterns, Data Integration




1. Introduction
Data integration and access to legacy data sources are key challenges for contemporary organi-
zations. In the whole spectrum of data integration and access solutions, the approach based on
Virtual Knowledge Graphs (VKGs) is gaining momentum, especially when the underlying data
sources to be integrated come in the form of relational databases (DBs) [1]. VKGs replace the
rigid structure of tables with the flexibility of a graph that incorporates domain knowledge and
is kept virtual, eliminating redundancies. A VKG specification consists of three main compo-
nents: (i) data sources (in the context of this paper, constituted by relational DBs), where the
actual data are stored; (ii) a domain ontology, capturing the relevant concepts, relations, and
constraints of the domain of interest; and (iii) a set of mappings, linking the data sources to the
ontology. A critical bottleneck in this setting lies in the definition and management of map-
pings. In this work, we focus on this issue by proposing a comprehensive catalog of mapping

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
" calvanese@inf.unibz.it (D. Calvanese); avigal@technion.ac.il (A. Gal); lanti@unibz.it (D. Lanti);
montali@unibz.it (M. Montali); mosca@unibz.it (A. Mosca); r.shraga@northeastern.edu (R. Shraga)
� 0000-0001-5174-9693 (D. Calvanese); 0000-0002-7028-661X (A. Gal); 0000-0003-1097-2965 (D. Lanti);
0000-0002-8021-3430 (M. Montali); 0000-0003-2323-3344 (A. Mosca); 0000-0001-8803-8481 (R. Shraga)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
�                                                                   Domain Knowledge




                                 Conceptual Model
         DB Schema                    (E-R)
                     DB Design                      OWL Encoding                      Alignment
                                                                      OWL 2 QL        Mappings
                                                                                                    OWL 2 QL
                                                                     DB Ontology
                                                                                                  Target Ontology
                                  VKG Mappings


Figure 1: The database and the ontology both stem from common domain knowledge.


patterns that emerge when linking data to ontologies. Our catalog is based on the (somehow
reasonable) assumption that both the ontology and the DB schema are derived from a conceptual
analysis of the domain of interest. The resulting knowledge may stay implicit, or may lead to an
explicit representation in the form of a structural conceptual model, which can be represented
using well-established notations such as UML, ORM, or E-R. On the one hand, this conceptual
model provides the basis for creating a corresponding domain ontology through a series of
semantic-preserving transformation steps. On the other hand, it can trigger the design process
that finally leads to the deployment of an actual DB. The whole view is depicted in Figure 1.
    Our catalog is built on well-established methodologies and patterns studied in data manage-
ment (e.g., W3C direct mappings (W3C-DM)1 and extensions), data analysis (e.g., algorithms
for discovering dependencies), and conceptual modeling (e.g., relational mapping techniques).
    The idea of mapping patterns is not new. For instance, work in [2] is closely related to ours,
as it also introduces a catalog of mapping patterns. However, there are some key differences
with our approach. One difference is that we consider KGs (with ontologies), whereas that work
focuses on property graphs without an ontology. More importantly, in [2] and in the related
literature, patterns are not formalized or grounded to a specific conceptual representation, but
are rather informally specified and discussed in a “by-example” fashion. On the contrary, each
of our patterns explicitly and non-ambiguously specifies the link between the conceptualization
and the DB instance, which is the one arising from applying well-known semantics-preserving
transformations studied in the area of DB design.
    We argue that this foundational grounding paves the way for a variety of VKG design scenarios,
depending on which information artifacts are available, and which ones must be produced. For
example, our patterns could be used to validate existing mappings, or to automatically generate
(i.e., bootstrap) ontology and mappings when only the DB is available. In fact, specific patterns
have been proposed also in relation to ontology and mapping bootstrapping, for which a variety of
tools and approaches have been developed in the last two decades [3, 4, 5, 6, 7]. The approaches in
the literature differ in terms of the overall purposes of bootstrapping (e.g., OBDA, data integration,
ontology learning, checking of DB schema constraints using ontology reasoning), the adopted on-
tology and mapping languages (e.g., OWL 2 profiles or RDFS as ontology languages, and R2RML
or custom languages for the specification of mappings), the different focus on direct and/or com-
plex mappings, and the assumed level of automation. The majority of the most recent approaches
    1
        http://www.w3.org/TR/rdb-direct-mapping/
�Table 1
Semantics of the DL-Liteℛ constructs that involve datatypes.

  Construct                     Syntax Element    Example      Semantics
  Top domain                    ⊤𝑉                             Δℐ𝑉
  Literal                       ℓ ∈ NL            “george”     ℓℐ ∈ Δℐ𝑉
  Datatype                      𝑇𝑖                xsd:int      𝑇𝑖ℐ ⊆ Δℐ𝑉
  Data property name            𝑑 ∈ ND            hasName      𝑑ℐ ⊆ Δℐ𝑂 × Δℐ𝑉
                                                                 𝑥 ∈ Δℐ𝑉 | ∃𝑣 ∈ Δℐ𝑉 : (𝑥, 𝑣) ∈ 𝑑ℐ
                                                               {︀                                  }︀
  Data property domain          𝛿(𝑑)              𝛿(hasName)
                                                                 𝑣 ∈ Δℐ𝑉 | ∃𝑜 ∈ Δℐ𝑂 : (𝑜, 𝑣) ∈ 𝑑ℐ
                                                               {︀                                 }︀
  Data property range           𝜌(𝑑)              𝜌(hasName)
  Data property negation        ¬𝑑                ¬hasName     Δ𝐼𝑂 × Δℐ𝑉 ∖ 𝑑ℐ


closely follow W3C-DM, deriving ontologies that mirror the structure of the input DB.
   The remainder of the paper is structured as follows: Section 2 introduces the notation and
basic notions on VKGs, Section 3 presents (an extract of) our catalog of mapping patterns, and
Section 4 concludes the paper.


2. Preliminaries
We use the bold font to denote tuples, e.g., x, y, are tuples. When convenient and non-
ambiguous, we treat tuples as sets and use set operators on them. We assume familiarity with
standard notions and languages from DBs [8], such as SQL or E-R diagrams.
   A VKG specification is a triple ⟨𝒯 , ℳ, 𝒮⟩ where 𝒯 is an ontology (or TBox), ℳ a set of
mappings, and 𝒮 the schema of a DB (with constraints, e.g., primary and foreign keys). In VKGs,
the ontology is formulated in OWL 2 QL 2 , but for conciseness we use its Description Logic (DL)
counterpart, DL-Liteℛ [9], here slightly enriched to handle datatypes.
   We fix the following enumerable, pairwise-disjoint sets: NI of individuals, NL of literal values,
NC of class names, NP of object property names, and ND of data property names.
   An OWL 2 QL TBox 𝒯 is a finite set of inclusion axioms of the form 𝐵 ⊑ 𝐶, 𝑞 ⊑ 𝑟, 𝜌(𝑑) ⊑ 𝑓 ,
or 𝑑 ⊑ 𝑣, where 𝐵, 𝐶 are classes, 𝑞, 𝑟 are object properties, 𝑑 is a data property, 𝑓 is a datatype
expression, 𝜌(𝑑) is a data property range expression, and 𝑣 is a data property expression. These
are defined according to the following grammar, where 𝐴 ∈ NC, 𝑑 ∈ ND, 𝑝 ∈ NP, 𝛿(𝑑) is a
data property domain expression, and 𝑇1 , . . . , 𝑇𝑛 are the RDF datatypes:

   𝐵 → 𝐴 | ∃𝑟 | 𝛿(𝑑)                          𝑞 → 𝑝 | 𝑝−             𝑓 → ⊤𝐷 | 𝑇1 | · · · | 𝑇𝑛
   𝐶 → ⊤𝐶 | 𝐵 | ¬𝐵                            𝑟 → 𝑞 | ¬𝑞             𝑣 → 𝑑 | ¬𝑑

In the rules above, ⊤𝐶 denotes the “top” element for concepts and ⊤𝐷 the one for data values
(called literals in the RDF terminology). An OWL 2 QL ABox 𝒜 is a finite set of assertions of the
form 𝐴(𝑎), 𝑝(𝑎, 𝑏), or 𝑑(𝑎, ℓ), where 𝐴 ∈ NC, 𝑝 ∈ NP, 𝑑 ∈ ND, 𝑎 and 𝑏 are individuals in NI,
and ℓ ∈ NL. We call the pair 𝒦 = ⟨𝒯 , 𝒜⟩ an OWL 2 QL Knowledge Graph (KG).

    2
        http://www.w3.org/TR/owl2-overview/
�   Similarly to first-order logic, the semantics of DL-Liteℛ KGs is given through Tarski-style
interpretations ℐ = ⟨Δℐ𝑂 , Δℐ𝑉 , ·ℐ ⟩, where Δℐ𝑂 is a non-empty domain of objects, Δℐ𝑣 is a non-
empty domain of values, and ·ℐ is an interpretation function. Table 1 reports the semantics for
the constructs involving datatypes. The other constructs are defined as in standard DL-Liteℛ [9].
As usual [10], we say that an interpretation ℐ satisfies a KG 𝒦, denoted by ℐ |= 𝒦, if ℐ satisfies
the ABox assertions and the inclusion axioms in 𝒦.

Mappings. Mappings specify how to populate classes and properties of the ontology with
individuals and values constructed from the data in the underlying DB. In other words, mappings
provide the ABox that, together with a given TBox, realizes a KG. In VKGs, the adopted
language for mappings in real-world systems is R2RML3 , but for conciseness we use here a more
convenient abstract notation inspired by the literature [11]: a mapping 𝑚 is a pair of the form
⟨𝑠: 𝑄(x), 𝑡: L(t(x))⟩, where 𝑄(x) is a SQL query with answer variables x over the DB schema 𝒮,
called source query, and L(t(x)) is a list of target atoms of the form 𝐶(t1 (x1 )), 𝑝(t1 (x1 ), t2 (x2 )),
or 𝑑(t1 (x1 ), t2 (x2 )), where 𝐶 ∈ NC, 𝑝 ∈ NP, 𝑑 ∈ ND, and t1 (x1 ) and t2 (x2 ) are terms that
we call templates. We express source queries in relational algebra, omitting answer variables
under the assumption that they coincide with the variables used in the target atoms.
   Intuitively, a template t(x) in the target atom of a mapping corresponds to an R2RML string
template4 , and is used to generate an IRI (hence, an object identifier) or an RDF literal, starting
from DB values retrieved by the source query in that mapping. For the examples, we use the
concrete syntax from the Ontop VKG system [6], in which the source query is expressed in SQL
and each target atom is expressed as an RDF triple pattern with templates. The answer variables
of the source query occurring in the target atoms are distinguished by enclosing them in curly
brackets { · · · }. The following is an example mapping expressed in such syntax:
source     SELECT ssn FROM person
target     ex:pers/{ssn} a ex:Person .

In the mapping above, the string ex: denotes a URI prefix, e.g., ex:Person is an abbreviation for
the URI http://www.example.com/Person. Such mapping, when applied to a DB instance 𝒟 of
𝒮, populates the class ex:Person with IRIs constructed by replacing the answer variable ssn
occurring in the target atom with the corresponding values assigned to that variable by the
answers to the SQL source query evaluated over 𝒟. For instance, if the source query returns
two answers that assign to the answer variable ssn respectively the values 1234 and 5678, then
the mapping above produces the following RDF graph (expressed in the Turtle syntax5 ), stating
that individuals ex:pers/1234 and ex:pers/5678 are both instances of class ex:Person:
    ex:pers/1234 a ex:Person .                                  ex:pers/5678 a ex:Person .

   We denote by 𝒜ℳ(𝒟) the virtual ABox constructed through mappings ℳ from a DB 𝒟.
Given a VKG specification ⟨𝒯 , ℳ, 𝒮⟩ and a database instance 𝒟 of 𝒮, the KG 𝒦 = ⟨𝒯 , 𝒜ℳ(𝒟) ⟩
is called the Virtual Knowledge Graph of ⟨𝒯 , ℳ, 𝒮⟩ through 𝒟. The qualifier “virtual” in the
    3
      http://www.w3.org/TR/r2rml/
    4
      https://www.w3.org/TR/r2rml/#dfn-string-template
    5
      http://www.w3.org/TR/turtle/
�name derives from the fact that the virtual ABox 𝒜ℳ(𝒟) in a VKG setting is not materialized
and stored somewhere. Query answering in VKGs, in fact, is carried out through query rewriting
and query unfolding techniques [11, 6]: user queries, expressed in SPARQL 6 , get translated on-
the-fly into equivalent SQL queries, which then are directly evaluated against the DB.


3. Mapping Patterns
In its basic form, a mapping pattern is a quadruple ⟨𝒞, 𝒮, ℳ, 𝒯 ⟩, where 𝒞 is a conceptual
model, 𝒮 a database schema, ℳ a set of mappings, and 𝒯 an (OWL 2 QL) ontology. In such
pattern, the pair ⟨𝒞, 𝒮⟩ puts into correspondence a conceptual representation with one of its
(many) admissible (i.e., formally sound [12, 13]) database schemata, like those prescribed by
well-established database modeling methodologies. The pair ⟨ℳ, 𝒯 ⟩, instead, is formed by the
DB ontology 𝒯 , which is the OWL 2 QL encoding7 of the conceptual model 𝒞, and the set ℳ of
mappings, providing the link between 𝒮 and 𝒯 . The term “DB ontology” refers to an ontology
whose concepts and properties reflect the constructs of the conceptual model, mirroring the
structure of the relational database, as displayed in Figure 1.
   Some of the more advanced patterns have a more complex structure, where pairs of conceptual
models and/or pairs of database schemata are used in place of 𝒞 and 𝒮, respectively (e.g., the
pattern “SHa” falls in this category). These patterns prescribe specific transformations to be
applied to an input conceptual (resp., DB) schema, in order to obtain an output conceptual
(resp., DB) schema. These output artifacts make explicit the presence of specific structures that
are revealed through the application of the pattern itself. These structures can in turn enable
further applications of patterns.

Presentation Conventions. We show the fragment of the conceptual model that is affected
by the pattern in E-R notation (adopting the original notation by Chen [14]). To compactly
represent sets of attributes, we use a small diamond in place of the small circle used for single
attributes in Chen’s notation. For cardinality constraints we follow the “look-here” convention,
that is, the cardinality constraint for a role is placed next to the entity participating in that role.
In the DB schema, we use 𝑇 (K, A) to denote a table with name 𝑇 , primary key consisting of
the attributes K, and additional attributes A. Given a set U of attributes in 𝑇 , we denote by
key𝑇 (U) the fact that U form a key for 𝑇 . Referential integrity constraints (like, e.g., foreign
keys) are denoted with arcs, pointing from the referencing attribute(s) to the referenced one(s).
For conciseness, we denote sets of the form {𝑜 | condition} as {𝑜}condition . In order to express
datatypes for data properties, we introduce two auxiliary functions: a function 𝜏 that, given
a DB attribute 𝐴, returns the DB datatype of 𝐴, and a function 𝜇 that associates, to each DB
datatype, a corresponding RDF datatype. For the definition of 𝜇, we re-use the Natural Mapping 8
correspondence provided by the R2RML recommendation. As a final note, following the E-R-
diagrams convention, we assume a default (1, 1) cardinality on attributes. For such a reason, in
the DB schema we assume all attributes to be not nullable by default (using the SQL convention,

    6
      http://www.w3.org/TR/sparql11-query
    7
      Modulo the expressivity of the OWL 2 QL language.
    8
      https://www.w3.org/TR/r2rml/#natural-mapping
�Table 2
An extract of our catalog of mapping patterns.
 Conceptual Model                            DB Schema                     Mappings                             Ontology
                                                                      Schema Entity (SE)                         ⎧                    ⎫
          K A
                                                                           𝑠: 𝑇𝐸                                 ⎨ 𝛿(𝑑𝐴 ) ⊑ 𝐶𝐸 ,      ⎬
                                              𝑇𝐸 (K, A)                    𝑡: 𝐶𝐸 (t𝐸 (K)),                         𝜌(𝑑𝐴 ) ⊑ 𝜇(𝜏 (𝐴)),
           E                                                                                                     ⎩ 𝐶 ⊑ 𝛿(𝑑 )
                                                                              {𝑑𝐴 (t𝐸 (K), 𝐴)}𝐴∈K∪A
                                                                                                                                      ⎭
                                                                                                                     𝐸       𝐴          𝐴∈K∪A
 In case of optional attributes, for each optional attribute 𝐴′ of 𝐸, add an opt(𝐴′ ) constraint to the DB schema and drop the corresponding
 inclusion axiom 𝐶𝐸 ⊑ 𝛿(𝑑𝐴′ ) from the ontology.
                                                                 Schema Relationship (SR)
 KE AE                     KF AF     𝑇𝐸 (K𝐸 , A𝐸 ) 𝑇𝐹 (K𝐹 , A𝐹 )            𝑠: 𝑇𝑅                                ∃𝑝𝑅 ⊑ 𝐶𝐸
   E       R                F        𝑇𝑅 (K𝑅𝐸 , K𝑅𝐹 )                        𝑡: 𝑝𝑅 (t𝐶𝐸 (K𝑅𝐸 ), t𝐶𝐹 (K𝑅𝐹 ))       ∃𝑝−
                                                                                                                   𝑅 ⊑ 𝐶𝐹

 • In case of cardinality (_, 1) on role 𝑅𝐸 (resp., 𝑅𝐹 ), the primary key of 𝑇𝑅 is restricted to the attributes K𝑅𝐸 (resp., K𝑅𝐹 ). In case both roles
   have cardinality (_, 1), either choice for the primary key is made, and the remaining attributes form a non-primary key in the logical schema.
 • In case of cardinality (1, _) on role 𝑅𝐸 (resp., 𝑅𝐹 ), the inclusion dependency K𝐸 ⊆ K𝑅𝐸 (resp., K𝐹 ⊆ K𝑅𝐹 ) holds in the schema, and the
   first (resp., second) inclusion axiom in the ontology holds in both directions. Note that when the maximum cardinality on role 𝑅𝐸 (resp.,
   𝑅𝐹 ) is 1, the corresponding inclusion dependency is actually a foreign key.
                                                Schema Relationship with Identifier Alignment (SRa)
 KE AE                     KF UF
                                   𝑇𝐸 (K𝐸 , A𝐸 ) 𝑇𝐹 (K𝐹 , U𝐹 , A𝐹 )         𝑠: 𝑇𝑅 ⋊⋉U𝑅𝐹 =U𝐹 𝑇𝐹                   ∃𝑝𝑅 ⊑ 𝐶𝐸
   E       R                F
                                   𝑇𝑅 (K𝑅𝐸 , U𝑅𝐹 ) key𝑇𝐹 (U𝐹 )              𝑡: 𝑝𝑅 (t𝐶𝐸 (K𝑅𝐸 ), t𝐶𝐹 (K𝐹 ))        ∃𝑝−
                                                                                                                   𝑅 ⊑ 𝐶𝐹
                            AF

                                                  Schema Hierarchy with Identifier Alignment (SHa)
         KE AE


          E                          𝑇𝐸 (K𝐸 , A𝐸 ) key𝑇𝐹 (K𝐹 𝐸 )
                 KF                                                                                              𝐶𝐹 ⊑ 𝐶𝐸
                                     𝑇𝐹 (K𝐹 , K𝐹 𝐸 , A𝐹 )                   𝑠: 𝑉𝐹                                ⎧                    ⎫
          F           AF                                                                                         ⎨ 𝛿(𝑑𝐴 ) ⊑ 𝐶𝐹 ,
                                                                            𝑡: 𝐶𝐹 (t𝐶𝐸 (K𝐹 𝐸 )),
                                                                                                                                      ⎬
         KE AE                                                                                                     𝜌(𝑑𝐴 ) ⊑ 𝜇(𝜏 (𝐴)),
                                     𝑇𝐸 (K𝐸 , A𝐸 ) key𝑉𝐹 (K𝐹 )                 {𝑑𝐴 (t𝐶𝐸 (K𝐹 𝐸 ), 𝐴)}𝐴∈K𝐹 ∪A𝐹     ⎩ 𝐶 ⊑ 𝛿(𝑑 )          ⎭
                                                                                                                     𝐹       𝐴          𝐴∈K ∪A
          E                                                                                                                                     𝐹   𝐹
                                     𝑉𝐹 (K𝐹 , K𝐹 𝐸 , A𝐹 ) = 𝑇𝐹
                 KF
          F           AF

 In this pattern, the “alignment” is meant to align the primary identifier used in the child entity to the primary identifier used in the parent
 entity. The other two possiblities for applying the pattern are:
 • the foreign key in the child entity is the primary key of that entity, and references a non-primary key of the parent entity;
 • the foreign key in the child entity is a non-primary key of that entity, and references a non-primary key of the parent entity.
 We depict here the most common scenario, where the foreign key points to the primary key of the parent entity.
 Observe that this pattern requires a change in the conceptual model (essentially keeping track of the attributes used for identifying the objects
 of the subclass).



declared as “NOT NULL”). An optional attribute 𝐴 is instead denoted by adding opt(𝐴) to the
DB schema. Such notation extends in the natural way to a set A of attributes.

Pattern Catalog.                   Table 2 shows an excerpt of our patterns, which we discuss in detail here.
Schema Entity (SE). This fundamental pattern describes the correspondence between an entity
with a primary identifier and attributes in the DB schema, and a class and data properties in the
ontology. The entity is expressed in the DB schema through a single table 𝑇𝐸 with primary key
K and other attributes A, as it is the norm in sound DB design practices. The mappings column
explains how 𝑇𝐸 is mapped into a corresponding class 𝐶𝐸 . The primary key of 𝑇𝐸 is employed
to construct the IRIs of the objects that are instances of 𝐶𝐸 , using a template t𝐸 specific for
that entity. Each relevant attribute of 𝑇𝐸 is mapped to a data property of 𝐶𝐸 , with suitable
domain and range axioms. A mandatory participation constraint is added to each data property
corresponding to a mandatory attribute.
   Example: A client registry table containing SSNs of clients, together with their name as
an additional attribute, is mapped to a Client class using the SSN to construct its objects. In
�addition, the SSN and name are mapped to two corresponding data properties.
Schema Relationship (SR). This pattern describes the correspondence between a binary relation-
ship without attributes and an OWL 2 QL object property, for the case where such relationship
is represented in the DB as a separate (usually, “many-to-many”) table. This pattern considers
three tables 𝑇𝑅 , 𝑇𝐸 , and 𝑇𝐹 , for which the set of columns in 𝑇𝑅 is partitioned into two parts
KRE and KRF that are foreign keys to 𝑇𝐸 and 𝑇𝐹 , respectively. The identifier of 𝑇𝑅 depends
on the role cardinalities in the E-R model. The pattern captures how 𝑇𝑅 is mapped to an object
property 𝑝𝑅 , using the two parts KRE and KRF of the partition to construct respectively the
subject and the object of the triples in 𝑝𝑅 . The templates t𝐶𝐸 and t𝐶𝐹 must be those respectively
used for building instances of classes 𝐶𝐸 corresponding to 𝑇𝐸 and 𝐶𝐹 corresponding to 𝑇𝐹 .
   Example: An additional table in the client registry stores the addresses of each client, and
has a foreign key to a table with locations. The former table is mapped to an address object
property, for which the ontology asserts that the domain is the class Person and the range an
additional class Location, which corresponds to the latter table.
Schema Relationship with Identifier Alignment (SRa). This pattern is similar to pattern SR, but it
comes with a modifier a, indicating that the pattern can be applied after the identifiers involved
in the relationship have been aligned. The alignment is necessary because the foreign key in 𝑇𝑅
does not refer to the primary key K𝐹 of 𝑇𝐹 , but to an alternative key U𝐹 . Since the instances of
the class 𝐶𝐹 corresponding to 𝑇𝐹 are constructed using the primary key K𝐹 of 𝑇𝐹 (cf. pattern
SE), also the pairs that populate 𝑝𝑅 should refer in their object position to that primary key,
which can only be retrieved via a join between 𝑇𝑅 and 𝑇𝐹 on the key U𝐹 .
   Example: The primary key of the table with locations is not given by the city and street,
which are used in the table that relates clients to their addresses, but is given by the latitude
and longitude of locations.
Schema Hierarchy with Identifier Alignment (SHa). This patterns handles the case where a
hierarchy is specified and the child entity uses a primary identifier different from the one in the
parent entity. In this situation, the foreign-key constraint can come in three different variants.
In the depicted one, the foreign key in 𝑇𝐹 is over a non-primary key KFE . The objects for 𝐶𝐹
have to be built out of KFE , rather than out of the primary key of 𝑇𝐹 . For this purpose, the
pattern creates a view 𝑉𝐹 identical to 𝑇𝐹 , except that KFE is the primary key. Also the foreign
key relations are preserved. Such view might enable further applications of patterns.
   Example: An ISA relation between entities Student and Person. Students are identified by
their matriculation number, whereas persons are identified by their SSN.


4. Conclusions and Future Work
In this work, we have identified and formally specified a number of mapping patterns emerging
when linking DBs to ontologies in a typical VKG setting. Our patterns are grounded in well-
established practices of DB design, and render explicit the connection between the conceptual
model, the DB schema, and the ontology. We envision that the organization in patterns can
enable a number of relevant tasks, notably mapping bootstrapping for incomplete VKGs.
�   This work is only a first step, with respect to both categorization of patterns, and their actual
use. Regarding the former, we are currently extending this initial catalog with more advanced
“data-driven” patterns, which are patterns where the data component needs to be taken into
account. Regarding the latter, we are investigating solutions to specific problems that need to
be addressed when setting-up a VKG scenario, like the problem of mapping bootstrapping.


Acknowledgments
This research has been partially supported by the Wallenberg AI, Autonomous Systems and
Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, by the Italian
Basic Research (PRIN) project HOPE, by the EU H2020 project INODE (grant agreement 863410),
and by the project MENS, funded through the 4th Call for Research of the Autonomous Province
of Bolzano (IN2219).


References
 [1] G. Xiao, L. Ding, B. Cogrel, D. Calvanese, Virtual Knowledge Graphs: An overview of
     systems and use cases, Data Intelligence 1 (2019) 201–223.
 [2] J. Sequeda, O. Lassila, Designing and Building Enterprise Knowledge Graphs, Morgan &
     Claypool Publishers, 2021.
 [3] L. F. de Medeiros, F. Priyatna, Ó. Corcho, MIRROR: Automatic R2RML mapping generation
     from relational databases, in: Proc. ICWE, volume 9114 of LNCS, Springer, 2015, pp. 326–
     343.
 [4] E. Jiménez-Ruiz, E. Kharlamov, D. Zheleznyakov, I. Horrocks, C. Pinkel, M. G. Skjæveland,
     E. Thorstensen, J. Mora, BootOX: Practical mapping of RDBs to OWL 2, in: Proc. ISWC,
     volume 9367 of LNCS, Springer, 2015, pp. 113–132.
 [5] C. Pinkel, C. Binnig, E. Kharlamov, P. Haase, IncMap: Pay as you go matching of relational
     schemata to OWL ontologies., in: Proc. 8th Int. Workshop on Ontology Matching (OM),
     volume 1111 of CEUR, CEUR-WS.org, 2013, pp. 37–48.
 [6] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez-
     Muro, G. Xiao, Ontop: Answering SPARQL queries over relational databases, Semantic
     Web J. 8 (2017) 471–487.
 [7] J. F. Sequeda, D. P. Miranker, Ultrawrap Mapper: A semi-automatic relational database to
     RDF (RDB2RDF) mapping tool, in: Proc. ISWC Posters & Demonstrations Track, volume
     1486 of CEUR, CEUR-WS.org, 2015.
 [8] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison Wesley, 1995.
 [9] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and
     efficient query answering in description logics: The DL-Lite family, JAR 39 (2007) 385–429.
[10] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The De-
     scription Logic Handbook: Theory, Implementation and Applications, 2nd ed., Cambridge
     University Press, 2007.
[11] A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, R. Rosati, Linking data to
     ontologies, J. on Data Semantics 10 (2008) 133–173.
�[12] R. Hull, Relative information capacity of simple relational database schemas, SIAM J. on
     Computing 15 (1986) 856–886.
[13] R. J. Miller, Y. E. Ioannidis, R. Ramakrishnan, Schema equivalence in heterogeneous
     systems: Bridging theory and practice, Information Systems 19 (1994) 3–31.
[14] P. P. Chen, The Entity-Relationship model: Toward a unified view of data, ACM TODS 1
     (1976) 9–36.
�
🖨 🚪