Difference between revisions of "Vol-3194/paper10"
Jump to navigation
Jump to search
(edited by wikiedit) |
(edited by wikiedit) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | =Paper= | |
{{Paper | {{Paper | ||
+ | |id=Vol-3194/paper10 | ||
+ | |storemode=property | ||
+ | |title=Conceptually-grounded Mapping Patterns for Virtual Knowledge Graphs | ||
+ | |pdfUrl=https://ceur-ws.org/Vol-3194/paper10.pdf | ||
+ | |volume=Vol-3194 | ||
+ | |authors=Diego Calvanese,Avigdor Gal,Davide Lanti,Marco Montali,Alessandro Mosca,Roee Shraga | ||
+ | |dblpUrl=https://dblp.org/rec/conf/sebd/CalvaneseGLM0S22 | ||
|wikidataid=Q117344920 | |wikidataid=Q117344920 | ||
}} | }} | ||
+ | ==Conceptually-grounded Mapping Patterns for Virtual Knowledge Graphs== | ||
+ | <pdf width="1500px">https://ceur-ws.org/Vol-3194/paper10.pdf</pdf> | ||
+ | <pre> | ||
+ | Conceptually-grounded Mapping Patterns | ||
+ | for Virtual Knowledge Graphs | ||
+ | Diego Calvanese1,2 , Avigdor Gal3 , Davide Lanti1 , Marco Montali1 , Alessandro Mosca1 | ||
+ | and Roee Shraga4 | ||
+ | 1 | ||
+ | Free-University of Bozen-Bolzano, Bolzano, Italy | ||
+ | 2 | ||
+ | Umeå University, Umeå, Sweden | ||
+ | 3 | ||
+ | Technion – Israel Institute of Technology, Haifa, Israel | ||
+ | 4 | ||
+ | Khoury College of Computer Science, Northeastern University, Boston, Massachusetts | ||
+ | |||
+ | |||
+ | Abstract | ||
+ | Knowledge Graphs (KGs) have been gaining momentum recently in both academia and industry, due to | ||
+ | the flexibility of their data model, allowing one to access and integrate collections of data of different | ||
+ | forms. Virtual Knowledge Graphs (VKGs), a variant of KGs originating from the field of Ontology-based | ||
+ | Data Access (OBDA), are a promising paradigm for integrating and accessing legacy data sources. The | ||
+ | main idea of VKGs is that the KG remains virtual: the end-user interacts with a KG, but queries are | ||
+ | reformulated on-the-fly as queries over the data source(s). To enable the paradigm, one needs to define | ||
+ | declarative mappings specifying the link between the data sources and the elements in the VKG. In this | ||
+ | work, we try to investigate common patterns that arise when specifying such mappings, building on | ||
+ | well-established methodologies from the area of conceptual modeling and database design. | ||
+ | |||
+ | Keywords | ||
+ | Virtual Knowledge Graphs, Ontology-based Data Access, Mapping patterns, Data Integration | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | 1. Introduction | ||
+ | Data integration and access to legacy data sources are key challenges for contemporary organi- | ||
+ | zations. In the whole spectrum of data integration and access solutions, the approach based on | ||
+ | Virtual Knowledge Graphs (VKGs) is gaining momentum, especially when the underlying data | ||
+ | sources to be integrated come in the form of relational databases (DBs) [1]. VKGs replace the | ||
+ | rigid structure of tables with the flexibility of a graph that incorporates domain knowledge and | ||
+ | is kept virtual, eliminating redundancies. A VKG specification consists of three main compo- | ||
+ | nents: (i) data sources (in the context of this paper, constituted by relational DBs), where the | ||
+ | actual data are stored; (ii) a domain ontology, capturing the relevant concepts, relations, and | ||
+ | constraints of the domain of interest; and (iii) a set of mappings, linking the data sources to the | ||
+ | ontology. A critical bottleneck in this setting lies in the definition and management of map- | ||
+ | pings. In this work, we focus on this issue by proposing a comprehensive catalog of mapping | ||
+ | |||
+ | SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy | ||
+ | " calvanese@inf.unibz.it (D. Calvanese); avigal@technion.ac.il (A. Gal); lanti@unibz.it (D. Lanti); | ||
+ | montali@unibz.it (M. Montali); mosca@unibz.it (A. Mosca); r.shraga@northeastern.edu (R. Shraga) | ||
+ | � 0000-0001-5174-9693 (D. Calvanese); 0000-0002-7028-661X (A. Gal); 0000-0003-1097-2965 (D. Lanti); | ||
+ | 0000-0002-8021-3430 (M. Montali); 0000-0003-2323-3344 (A. Mosca); 0000-0001-8803-8481 (R. Shraga) | ||
+ | © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). | ||
+ | CEUR | ||
+ | Workshop | ||
+ | Proceedings | ||
+ | http://ceur-ws.org | ||
+ | ISSN 1613-0073 | ||
+ | CEUR Workshop Proceedings (CEUR-WS.org) | ||
+ | � Domain Knowledge | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Conceptual Model | ||
+ | DB Schema (E-R) | ||
+ | DB Design OWL Encoding Alignment | ||
+ | OWL 2 QL Mappings | ||
+ | OWL 2 QL | ||
+ | DB Ontology | ||
+ | Target Ontology | ||
+ | VKG Mappings | ||
+ | |||
+ | |||
+ | Figure 1: The database and the ontology both stem from common domain knowledge. | ||
+ | |||
+ | |||
+ | patterns that emerge when linking data to ontologies. Our catalog is based on the (somehow | ||
+ | reasonable) assumption that both the ontology and the DB schema are derived from a conceptual | ||
+ | analysis of the domain of interest. The resulting knowledge may stay implicit, or may lead to an | ||
+ | explicit representation in the form of a structural conceptual model, which can be represented | ||
+ | using well-established notations such as UML, ORM, or E-R. On the one hand, this conceptual | ||
+ | model provides the basis for creating a corresponding domain ontology through a series of | ||
+ | semantic-preserving transformation steps. On the other hand, it can trigger the design process | ||
+ | that finally leads to the deployment of an actual DB. The whole view is depicted in Figure 1. | ||
+ | Our catalog is built on well-established methodologies and patterns studied in data manage- | ||
+ | ment (e.g., W3C direct mappings (W3C-DM)1 and extensions), data analysis (e.g., algorithms | ||
+ | for discovering dependencies), and conceptual modeling (e.g., relational mapping techniques). | ||
+ | The idea of mapping patterns is not new. For instance, work in [2] is closely related to ours, | ||
+ | as it also introduces a catalog of mapping patterns. However, there are some key differences | ||
+ | with our approach. One difference is that we consider KGs (with ontologies), whereas that work | ||
+ | focuses on property graphs without an ontology. More importantly, in [2] and in the related | ||
+ | literature, patterns are not formalized or grounded to a specific conceptual representation, but | ||
+ | are rather informally specified and discussed in a “by-example” fashion. On the contrary, each | ||
+ | of our patterns explicitly and non-ambiguously specifies the link between the conceptualization | ||
+ | and the DB instance, which is the one arising from applying well-known semantics-preserving | ||
+ | transformations studied in the area of DB design. | ||
+ | We argue that this foundational grounding paves the way for a variety of VKG design scenarios, | ||
+ | depending on which information artifacts are available, and which ones must be produced. For | ||
+ | example, our patterns could be used to validate existing mappings, or to automatically generate | ||
+ | (i.e., bootstrap) ontology and mappings when only the DB is available. In fact, specific patterns | ||
+ | have been proposed also in relation to ontology and mapping bootstrapping, for which a variety of | ||
+ | tools and approaches have been developed in the last two decades [3, 4, 5, 6, 7]. The approaches in | ||
+ | the literature differ in terms of the overall purposes of bootstrapping (e.g., OBDA, data integration, | ||
+ | ontology learning, checking of DB schema constraints using ontology reasoning), the adopted on- | ||
+ | tology and mapping languages (e.g., OWL 2 profiles or RDFS as ontology languages, and R2RML | ||
+ | or custom languages for the specification of mappings), the different focus on direct and/or com- | ||
+ | plex mappings, and the assumed level of automation. The majority of the most recent approaches | ||
+ | 1 | ||
+ | http://www.w3.org/TR/rdb-direct-mapping/ | ||
+ | �Table 1 | ||
+ | Semantics of the DL-Liteℛ constructs that involve datatypes. | ||
+ | |||
+ | Construct Syntax Element Example Semantics | ||
+ | Top domain ⊤𝑉 Δℐ𝑉 | ||
+ | Literal ℓ ∈ NL “george” ℓℐ ∈ Δℐ𝑉 | ||
+ | Datatype 𝑇𝑖 xsd:int 𝑇𝑖ℐ ⊆ Δℐ𝑉 | ||
+ | Data property name 𝑑 ∈ ND hasName 𝑑ℐ ⊆ Δℐ𝑂 × Δℐ𝑉 | ||
+ | 𝑥 ∈ Δℐ𝑉 | ∃𝑣 ∈ Δℐ𝑉 : (𝑥, 𝑣) ∈ 𝑑ℐ | ||
+ | {︀ }︀ | ||
+ | Data property domain 𝛿(𝑑) 𝛿(hasName) | ||
+ | 𝑣 ∈ Δℐ𝑉 | ∃𝑜 ∈ Δℐ𝑂 : (𝑜, 𝑣) ∈ 𝑑ℐ | ||
+ | {︀ }︀ | ||
+ | Data property range 𝜌(𝑑) 𝜌(hasName) | ||
+ | Data property negation ¬𝑑 ¬hasName Δ𝐼𝑂 × Δℐ𝑉 ∖ 𝑑ℐ | ||
+ | |||
+ | |||
+ | closely follow W3C-DM, deriving ontologies that mirror the structure of the input DB. | ||
+ | The remainder of the paper is structured as follows: Section 2 introduces the notation and | ||
+ | basic notions on VKGs, Section 3 presents (an extract of) our catalog of mapping patterns, and | ||
+ | Section 4 concludes the paper. | ||
+ | |||
+ | |||
+ | 2. Preliminaries | ||
+ | We use the bold font to denote tuples, e.g., x, y, are tuples. When convenient and non- | ||
+ | ambiguous, we treat tuples as sets and use set operators on them. We assume familiarity with | ||
+ | standard notions and languages from DBs [8], such as SQL or E-R diagrams. | ||
+ | A VKG specification is a triple ⟨𝒯 , ℳ, 𝒮⟩ where 𝒯 is an ontology (or TBox), ℳ a set of | ||
+ | mappings, and 𝒮 the schema of a DB (with constraints, e.g., primary and foreign keys). In VKGs, | ||
+ | the ontology is formulated in OWL 2 QL 2 , but for conciseness we use its Description Logic (DL) | ||
+ | counterpart, DL-Liteℛ [9], here slightly enriched to handle datatypes. | ||
+ | We fix the following enumerable, pairwise-disjoint sets: NI of individuals, NL of literal values, | ||
+ | NC of class names, NP of object property names, and ND of data property names. | ||
+ | An OWL 2 QL TBox 𝒯 is a finite set of inclusion axioms of the form 𝐵 ⊑ 𝐶, 𝑞 ⊑ 𝑟, 𝜌(𝑑) ⊑ 𝑓 , | ||
+ | or 𝑑 ⊑ 𝑣, where 𝐵, 𝐶 are classes, 𝑞, 𝑟 are object properties, 𝑑 is a data property, 𝑓 is a datatype | ||
+ | expression, 𝜌(𝑑) is a data property range expression, and 𝑣 is a data property expression. These | ||
+ | are defined according to the following grammar, where 𝐴 ∈ NC, 𝑑 ∈ ND, 𝑝 ∈ NP, 𝛿(𝑑) is a | ||
+ | data property domain expression, and 𝑇1 , . . . , 𝑇𝑛 are the RDF datatypes: | ||
+ | |||
+ | 𝐵 → 𝐴 | ∃𝑟 | 𝛿(𝑑) 𝑞 → 𝑝 | 𝑝− 𝑓 → ⊤𝐷 | 𝑇1 | · · · | 𝑇𝑛 | ||
+ | 𝐶 → ⊤𝐶 | 𝐵 | ¬𝐵 𝑟 → 𝑞 | ¬𝑞 𝑣 → 𝑑 | ¬𝑑 | ||
+ | |||
+ | In the rules above, ⊤𝐶 denotes the “top” element for concepts and ⊤𝐷 the one for data values | ||
+ | (called literals in the RDF terminology). An OWL 2 QL ABox 𝒜 is a finite set of assertions of the | ||
+ | form 𝐴(𝑎), 𝑝(𝑎, 𝑏), or 𝑑(𝑎, ℓ), where 𝐴 ∈ NC, 𝑝 ∈ NP, 𝑑 ∈ ND, 𝑎 and 𝑏 are individuals in NI, | ||
+ | and ℓ ∈ NL. We call the pair 𝒦 = ⟨𝒯 , 𝒜⟩ an OWL 2 QL Knowledge Graph (KG). | ||
+ | |||
+ | 2 | ||
+ | http://www.w3.org/TR/owl2-overview/ | ||
+ | � Similarly to first-order logic, the semantics of DL-Liteℛ KGs is given through Tarski-style | ||
+ | interpretations ℐ = ⟨Δℐ𝑂 , Δℐ𝑉 , ·ℐ ⟩, where Δℐ𝑂 is a non-empty domain of objects, Δℐ𝑣 is a non- | ||
+ | empty domain of values, and ·ℐ is an interpretation function. Table 1 reports the semantics for | ||
+ | the constructs involving datatypes. The other constructs are defined as in standard DL-Liteℛ [9]. | ||
+ | As usual [10], we say that an interpretation ℐ satisfies a KG 𝒦, denoted by ℐ |= 𝒦, if ℐ satisfies | ||
+ | the ABox assertions and the inclusion axioms in 𝒦. | ||
+ | |||
+ | Mappings. Mappings specify how to populate classes and properties of the ontology with | ||
+ | individuals and values constructed from the data in the underlying DB. In other words, mappings | ||
+ | provide the ABox that, together with a given TBox, realizes a KG. In VKGs, the adopted | ||
+ | language for mappings in real-world systems is R2RML3 , but for conciseness we use here a more | ||
+ | convenient abstract notation inspired by the literature [11]: a mapping 𝑚 is a pair of the form | ||
+ | ⟨𝑠: 𝑄(x), 𝑡: L(t(x))⟩, where 𝑄(x) is a SQL query with answer variables x over the DB schema 𝒮, | ||
+ | called source query, and L(t(x)) is a list of target atoms of the form 𝐶(t1 (x1 )), 𝑝(t1 (x1 ), t2 (x2 )), | ||
+ | or 𝑑(t1 (x1 ), t2 (x2 )), where 𝐶 ∈ NC, 𝑝 ∈ NP, 𝑑 ∈ ND, and t1 (x1 ) and t2 (x2 ) are terms that | ||
+ | we call templates. We express source queries in relational algebra, omitting answer variables | ||
+ | under the assumption that they coincide with the variables used in the target atoms. | ||
+ | Intuitively, a template t(x) in the target atom of a mapping corresponds to an R2RML string | ||
+ | template4 , and is used to generate an IRI (hence, an object identifier) or an RDF literal, starting | ||
+ | from DB values retrieved by the source query in that mapping. For the examples, we use the | ||
+ | concrete syntax from the Ontop VKG system [6], in which the source query is expressed in SQL | ||
+ | and each target atom is expressed as an RDF triple pattern with templates. The answer variables | ||
+ | of the source query occurring in the target atoms are distinguished by enclosing them in curly | ||
+ | brackets { · · · }. The following is an example mapping expressed in such syntax: | ||
+ | source SELECT ssn FROM person | ||
+ | target ex:pers/{ssn} a ex:Person . | ||
+ | |||
+ | In the mapping above, the string ex: denotes a URI prefix, e.g., ex:Person is an abbreviation for | ||
+ | the URI http://www.example.com/Person. Such mapping, when applied to a DB instance 𝒟 of | ||
+ | 𝒮, populates the class ex:Person with IRIs constructed by replacing the answer variable ssn | ||
+ | occurring in the target atom with the corresponding values assigned to that variable by the | ||
+ | answers to the SQL source query evaluated over 𝒟. For instance, if the source query returns | ||
+ | two answers that assign to the answer variable ssn respectively the values 1234 and 5678, then | ||
+ | the mapping above produces the following RDF graph (expressed in the Turtle syntax5 ), stating | ||
+ | that individuals ex:pers/1234 and ex:pers/5678 are both instances of class ex:Person: | ||
+ | ex:pers/1234 a ex:Person . ex:pers/5678 a ex:Person . | ||
+ | |||
+ | We denote by 𝒜ℳ(𝒟) the virtual ABox constructed through mappings ℳ from a DB 𝒟. | ||
+ | Given a VKG specification ⟨𝒯 , ℳ, 𝒮⟩ and a database instance 𝒟 of 𝒮, the KG 𝒦 = ⟨𝒯 , 𝒜ℳ(𝒟) ⟩ | ||
+ | is called the Virtual Knowledge Graph of ⟨𝒯 , ℳ, 𝒮⟩ through 𝒟. The qualifier “virtual” in the | ||
+ | 3 | ||
+ | http://www.w3.org/TR/r2rml/ | ||
+ | 4 | ||
+ | https://www.w3.org/TR/r2rml/#dfn-string-template | ||
+ | 5 | ||
+ | http://www.w3.org/TR/turtle/ | ||
+ | �name derives from the fact that the virtual ABox 𝒜ℳ(𝒟) in a VKG setting is not materialized | ||
+ | and stored somewhere. Query answering in VKGs, in fact, is carried out through query rewriting | ||
+ | and query unfolding techniques [11, 6]: user queries, expressed in SPARQL 6 , get translated on- | ||
+ | the-fly into equivalent SQL queries, which then are directly evaluated against the DB. | ||
+ | |||
+ | |||
+ | 3. Mapping Patterns | ||
+ | In its basic form, a mapping pattern is a quadruple ⟨𝒞, 𝒮, ℳ, 𝒯 ⟩, where 𝒞 is a conceptual | ||
+ | model, 𝒮 a database schema, ℳ a set of mappings, and 𝒯 an (OWL 2 QL) ontology. In such | ||
+ | pattern, the pair ⟨𝒞, 𝒮⟩ puts into correspondence a conceptual representation with one of its | ||
+ | (many) admissible (i.e., formally sound [12, 13]) database schemata, like those prescribed by | ||
+ | well-established database modeling methodologies. The pair ⟨ℳ, 𝒯 ⟩, instead, is formed by the | ||
+ | DB ontology 𝒯 , which is the OWL 2 QL encoding7 of the conceptual model 𝒞, and the set ℳ of | ||
+ | mappings, providing the link between 𝒮 and 𝒯 . The term “DB ontology” refers to an ontology | ||
+ | whose concepts and properties reflect the constructs of the conceptual model, mirroring the | ||
+ | structure of the relational database, as displayed in Figure 1. | ||
+ | Some of the more advanced patterns have a more complex structure, where pairs of conceptual | ||
+ | models and/or pairs of database schemata are used in place of 𝒞 and 𝒮, respectively (e.g., the | ||
+ | pattern “SHa” falls in this category). These patterns prescribe specific transformations to be | ||
+ | applied to an input conceptual (resp., DB) schema, in order to obtain an output conceptual | ||
+ | (resp., DB) schema. These output artifacts make explicit the presence of specific structures that | ||
+ | are revealed through the application of the pattern itself. These structures can in turn enable | ||
+ | further applications of patterns. | ||
+ | |||
+ | Presentation Conventions. We show the fragment of the conceptual model that is affected | ||
+ | by the pattern in E-R notation (adopting the original notation by Chen [14]). To compactly | ||
+ | represent sets of attributes, we use a small diamond in place of the small circle used for single | ||
+ | attributes in Chen’s notation. For cardinality constraints we follow the “look-here” convention, | ||
+ | that is, the cardinality constraint for a role is placed next to the entity participating in that role. | ||
+ | In the DB schema, we use 𝑇 (K, A) to denote a table with name 𝑇 , primary key consisting of | ||
+ | the attributes K, and additional attributes A. Given a set U of attributes in 𝑇 , we denote by | ||
+ | key𝑇 (U) the fact that U form a key for 𝑇 . Referential integrity constraints (like, e.g., foreign | ||
+ | keys) are denoted with arcs, pointing from the referencing attribute(s) to the referenced one(s). | ||
+ | For conciseness, we denote sets of the form {𝑜 | condition} as {𝑜}condition . In order to express | ||
+ | datatypes for data properties, we introduce two auxiliary functions: a function 𝜏 that, given | ||
+ | a DB attribute 𝐴, returns the DB datatype of 𝐴, and a function 𝜇 that associates, to each DB | ||
+ | datatype, a corresponding RDF datatype. For the definition of 𝜇, we re-use the Natural Mapping 8 | ||
+ | correspondence provided by the R2RML recommendation. As a final note, following the E-R- | ||
+ | diagrams convention, we assume a default (1, 1) cardinality on attributes. For such a reason, in | ||
+ | the DB schema we assume all attributes to be not nullable by default (using the SQL convention, | ||
+ | |||
+ | 6 | ||
+ | http://www.w3.org/TR/sparql11-query | ||
+ | 7 | ||
+ | Modulo the expressivity of the OWL 2 QL language. | ||
+ | 8 | ||
+ | https://www.w3.org/TR/r2rml/#natural-mapping | ||
+ | �Table 2 | ||
+ | An extract of our catalog of mapping patterns. | ||
+ | Conceptual Model DB Schema Mappings Ontology | ||
+ | Schema Entity (SE) ⎧ ⎫ | ||
+ | K A | ||
+ | 𝑠: 𝑇𝐸 ⎨ 𝛿(𝑑𝐴 ) ⊑ 𝐶𝐸 , ⎬ | ||
+ | 𝑇𝐸 (K, A) 𝑡: 𝐶𝐸 (t𝐸 (K)), 𝜌(𝑑𝐴 ) ⊑ 𝜇(𝜏 (𝐴)), | ||
+ | E ⎩ 𝐶 ⊑ 𝛿(𝑑 ) | ||
+ | {𝑑𝐴 (t𝐸 (K), 𝐴)}𝐴∈K∪A | ||
+ | ⎭ | ||
+ | 𝐸 𝐴 𝐴∈K∪A | ||
+ | In case of optional attributes, for each optional attribute 𝐴′ of 𝐸, add an opt(𝐴′ ) constraint to the DB schema and drop the corresponding | ||
+ | inclusion axiom 𝐶𝐸 ⊑ 𝛿(𝑑𝐴′ ) from the ontology. | ||
+ | Schema Relationship (SR) | ||
+ | KE AE KF AF 𝑇𝐸 (K𝐸 , A𝐸 ) 𝑇𝐹 (K𝐹 , A𝐹 ) 𝑠: 𝑇𝑅 ∃𝑝𝑅 ⊑ 𝐶𝐸 | ||
+ | E R F 𝑇𝑅 (K𝑅𝐸 , K𝑅𝐹 ) 𝑡: 𝑝𝑅 (t𝐶𝐸 (K𝑅𝐸 ), t𝐶𝐹 (K𝑅𝐹 )) ∃𝑝− | ||
+ | 𝑅 ⊑ 𝐶𝐹 | ||
+ | |||
+ | • In case of cardinality (_, 1) on role 𝑅𝐸 (resp., 𝑅𝐹 ), the primary key of 𝑇𝑅 is restricted to the attributes K𝑅𝐸 (resp., K𝑅𝐹 ). In case both roles | ||
+ | have cardinality (_, 1), either choice for the primary key is made, and the remaining attributes form a non-primary key in the logical schema. | ||
+ | • In case of cardinality (1, _) on role 𝑅𝐸 (resp., 𝑅𝐹 ), the inclusion dependency K𝐸 ⊆ K𝑅𝐸 (resp., K𝐹 ⊆ K𝑅𝐹 ) holds in the schema, and the | ||
+ | first (resp., second) inclusion axiom in the ontology holds in both directions. Note that when the maximum cardinality on role 𝑅𝐸 (resp., | ||
+ | 𝑅𝐹 ) is 1, the corresponding inclusion dependency is actually a foreign key. | ||
+ | Schema Relationship with Identifier Alignment (SRa) | ||
+ | KE AE KF UF | ||
+ | 𝑇𝐸 (K𝐸 , A𝐸 ) 𝑇𝐹 (K𝐹 , U𝐹 , A𝐹 ) 𝑠: 𝑇𝑅 ⋊⋉U𝑅𝐹 =U𝐹 𝑇𝐹 ∃𝑝𝑅 ⊑ 𝐶𝐸 | ||
+ | E R F | ||
+ | 𝑇𝑅 (K𝑅𝐸 , U𝑅𝐹 ) key𝑇𝐹 (U𝐹 ) 𝑡: 𝑝𝑅 (t𝐶𝐸 (K𝑅𝐸 ), t𝐶𝐹 (K𝐹 )) ∃𝑝− | ||
+ | 𝑅 ⊑ 𝐶𝐹 | ||
+ | AF | ||
+ | |||
+ | Schema Hierarchy with Identifier Alignment (SHa) | ||
+ | KE AE | ||
+ | |||
+ | |||
+ | E 𝑇𝐸 (K𝐸 , A𝐸 ) key𝑇𝐹 (K𝐹 𝐸 ) | ||
+ | KF 𝐶𝐹 ⊑ 𝐶𝐸 | ||
+ | 𝑇𝐹 (K𝐹 , K𝐹 𝐸 , A𝐹 ) 𝑠: 𝑉𝐹 ⎧ ⎫ | ||
+ | F AF ⎨ 𝛿(𝑑𝐴 ) ⊑ 𝐶𝐹 , | ||
+ | 𝑡: 𝐶𝐹 (t𝐶𝐸 (K𝐹 𝐸 )), | ||
+ | ⎬ | ||
+ | KE AE 𝜌(𝑑𝐴 ) ⊑ 𝜇(𝜏 (𝐴)), | ||
+ | 𝑇𝐸 (K𝐸 , A𝐸 ) key𝑉𝐹 (K𝐹 ) {𝑑𝐴 (t𝐶𝐸 (K𝐹 𝐸 ), 𝐴)}𝐴∈K𝐹 ∪A𝐹 ⎩ 𝐶 ⊑ 𝛿(𝑑 ) ⎭ | ||
+ | 𝐹 𝐴 𝐴∈K ∪A | ||
+ | E 𝐹 𝐹 | ||
+ | 𝑉𝐹 (K𝐹 , K𝐹 𝐸 , A𝐹 ) = 𝑇𝐹 | ||
+ | KF | ||
+ | F AF | ||
+ | |||
+ | In this pattern, the “alignment” is meant to align the primary identifier used in the child entity to the primary identifier used in the parent | ||
+ | entity. The other two possiblities for applying the pattern are: | ||
+ | • the foreign key in the child entity is the primary key of that entity, and references a non-primary key of the parent entity; | ||
+ | • the foreign key in the child entity is a non-primary key of that entity, and references a non-primary key of the parent entity. | ||
+ | We depict here the most common scenario, where the foreign key points to the primary key of the parent entity. | ||
+ | Observe that this pattern requires a change in the conceptual model (essentially keeping track of the attributes used for identifying the objects | ||
+ | of the subclass). | ||
+ | |||
+ | |||
+ | |||
+ | declared as “NOT NULL”). An optional attribute 𝐴 is instead denoted by adding opt(𝐴) to the | ||
+ | DB schema. Such notation extends in the natural way to a set A of attributes. | ||
+ | |||
+ | Pattern Catalog. Table 2 shows an excerpt of our patterns, which we discuss in detail here. | ||
+ | Schema Entity (SE). This fundamental pattern describes the correspondence between an entity | ||
+ | with a primary identifier and attributes in the DB schema, and a class and data properties in the | ||
+ | ontology. The entity is expressed in the DB schema through a single table 𝑇𝐸 with primary key | ||
+ | K and other attributes A, as it is the norm in sound DB design practices. The mappings column | ||
+ | explains how 𝑇𝐸 is mapped into a corresponding class 𝐶𝐸 . The primary key of 𝑇𝐸 is employed | ||
+ | to construct the IRIs of the objects that are instances of 𝐶𝐸 , using a template t𝐸 specific for | ||
+ | that entity. Each relevant attribute of 𝑇𝐸 is mapped to a data property of 𝐶𝐸 , with suitable | ||
+ | domain and range axioms. A mandatory participation constraint is added to each data property | ||
+ | corresponding to a mandatory attribute. | ||
+ | Example: A client registry table containing SSNs of clients, together with their name as | ||
+ | an additional attribute, is mapped to a Client class using the SSN to construct its objects. In | ||
+ | �addition, the SSN and name are mapped to two corresponding data properties. | ||
+ | Schema Relationship (SR). This pattern describes the correspondence between a binary relation- | ||
+ | ship without attributes and an OWL 2 QL object property, for the case where such relationship | ||
+ | is represented in the DB as a separate (usually, “many-to-many”) table. This pattern considers | ||
+ | three tables 𝑇𝑅 , 𝑇𝐸 , and 𝑇𝐹 , for which the set of columns in 𝑇𝑅 is partitioned into two parts | ||
+ | KRE and KRF that are foreign keys to 𝑇𝐸 and 𝑇𝐹 , respectively. The identifier of 𝑇𝑅 depends | ||
+ | on the role cardinalities in the E-R model. The pattern captures how 𝑇𝑅 is mapped to an object | ||
+ | property 𝑝𝑅 , using the two parts KRE and KRF of the partition to construct respectively the | ||
+ | subject and the object of the triples in 𝑝𝑅 . The templates t𝐶𝐸 and t𝐶𝐹 must be those respectively | ||
+ | used for building instances of classes 𝐶𝐸 corresponding to 𝑇𝐸 and 𝐶𝐹 corresponding to 𝑇𝐹 . | ||
+ | Example: An additional table in the client registry stores the addresses of each client, and | ||
+ | has a foreign key to a table with locations. The former table is mapped to an address object | ||
+ | property, for which the ontology asserts that the domain is the class Person and the range an | ||
+ | additional class Location, which corresponds to the latter table. | ||
+ | Schema Relationship with Identifier Alignment (SRa). This pattern is similar to pattern SR, but it | ||
+ | comes with a modifier a, indicating that the pattern can be applied after the identifiers involved | ||
+ | in the relationship have been aligned. The alignment is necessary because the foreign key in 𝑇𝑅 | ||
+ | does not refer to the primary key K𝐹 of 𝑇𝐹 , but to an alternative key U𝐹 . Since the instances of | ||
+ | the class 𝐶𝐹 corresponding to 𝑇𝐹 are constructed using the primary key K𝐹 of 𝑇𝐹 (cf. pattern | ||
+ | SE), also the pairs that populate 𝑝𝑅 should refer in their object position to that primary key, | ||
+ | which can only be retrieved via a join between 𝑇𝑅 and 𝑇𝐹 on the key U𝐹 . | ||
+ | Example: The primary key of the table with locations is not given by the city and street, | ||
+ | which are used in the table that relates clients to their addresses, but is given by the latitude | ||
+ | and longitude of locations. | ||
+ | Schema Hierarchy with Identifier Alignment (SHa). This patterns handles the case where a | ||
+ | hierarchy is specified and the child entity uses a primary identifier different from the one in the | ||
+ | parent entity. In this situation, the foreign-key constraint can come in three different variants. | ||
+ | In the depicted one, the foreign key in 𝑇𝐹 is over a non-primary key KFE . The objects for 𝐶𝐹 | ||
+ | have to be built out of KFE , rather than out of the primary key of 𝑇𝐹 . For this purpose, the | ||
+ | pattern creates a view 𝑉𝐹 identical to 𝑇𝐹 , except that KFE is the primary key. Also the foreign | ||
+ | key relations are preserved. Such view might enable further applications of patterns. | ||
+ | Example: An ISA relation between entities Student and Person. Students are identified by | ||
+ | their matriculation number, whereas persons are identified by their SSN. | ||
+ | |||
+ | |||
+ | 4. Conclusions and Future Work | ||
+ | In this work, we have identified and formally specified a number of mapping patterns emerging | ||
+ | when linking DBs to ontologies in a typical VKG setting. Our patterns are grounded in well- | ||
+ | established practices of DB design, and render explicit the connection between the conceptual | ||
+ | model, the DB schema, and the ontology. We envision that the organization in patterns can | ||
+ | enable a number of relevant tasks, notably mapping bootstrapping for incomplete VKGs. | ||
+ | � This work is only a first step, with respect to both categorization of patterns, and their actual | ||
+ | use. Regarding the former, we are currently extending this initial catalog with more advanced | ||
+ | “data-driven” patterns, which are patterns where the data component needs to be taken into | ||
+ | account. Regarding the latter, we are investigating solutions to specific problems that need to | ||
+ | be addressed when setting-up a VKG scenario, like the problem of mapping bootstrapping. | ||
+ | |||
+ | |||
+ | Acknowledgments | ||
+ | This research has been partially supported by the Wallenberg AI, Autonomous Systems and | ||
+ | Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, by the Italian | ||
+ | Basic Research (PRIN) project HOPE, by the EU H2020 project INODE (grant agreement 863410), | ||
+ | and by the project MENS, funded through the 4th Call for Research of the Autonomous Province | ||
+ | of Bolzano (IN2219). | ||
+ | |||
+ | |||
+ | References | ||
+ | [1] G. Xiao, L. Ding, B. Cogrel, D. Calvanese, Virtual Knowledge Graphs: An overview of | ||
+ | systems and use cases, Data Intelligence 1 (2019) 201–223. | ||
+ | [2] J. Sequeda, O. Lassila, Designing and Building Enterprise Knowledge Graphs, Morgan & | ||
+ | Claypool Publishers, 2021. | ||
+ | [3] L. F. de Medeiros, F. Priyatna, Ó. Corcho, MIRROR: Automatic R2RML mapping generation | ||
+ | from relational databases, in: Proc. ICWE, volume 9114 of LNCS, Springer, 2015, pp. 326– | ||
+ | 343. | ||
+ | [4] E. Jiménez-Ruiz, E. Kharlamov, D. Zheleznyakov, I. Horrocks, C. Pinkel, M. G. Skjæveland, | ||
+ | E. Thorstensen, J. Mora, BootOX: Practical mapping of RDBs to OWL 2, in: Proc. ISWC, | ||
+ | volume 9367 of LNCS, Springer, 2015, pp. 113–132. | ||
+ | [5] C. Pinkel, C. Binnig, E. Kharlamov, P. Haase, IncMap: Pay as you go matching of relational | ||
+ | schemata to OWL ontologies., in: Proc. 8th Int. Workshop on Ontology Matching (OM), | ||
+ | volume 1111 of CEUR, CEUR-WS.org, 2013, pp. 37–48. | ||
+ | [6] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez- | ||
+ | Muro, G. Xiao, Ontop: Answering SPARQL queries over relational databases, Semantic | ||
+ | Web J. 8 (2017) 471–487. | ||
+ | [7] J. F. Sequeda, D. P. Miranker, Ultrawrap Mapper: A semi-automatic relational database to | ||
+ | RDF (RDB2RDF) mapping tool, in: Proc. ISWC Posters & Demonstrations Track, volume | ||
+ | 1486 of CEUR, CEUR-WS.org, 2015. | ||
+ | [8] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison Wesley, 1995. | ||
+ | [9] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and | ||
+ | efficient query answering in description logics: The DL-Lite family, JAR 39 (2007) 385–429. | ||
+ | [10] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The De- | ||
+ | scription Logic Handbook: Theory, Implementation and Applications, 2nd ed., Cambridge | ||
+ | University Press, 2007. | ||
+ | [11] A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, R. Rosati, Linking data to | ||
+ | ontologies, J. on Data Semantics 10 (2008) 133–173. | ||
+ | �[12] R. Hull, Relative information capacity of simple relational database schemas, SIAM J. on | ||
+ | Computing 15 (1986) 856–886. | ||
+ | [13] R. J. Miller, Y. E. Ioannidis, R. Ramakrishnan, Schema equivalence in heterogeneous | ||
+ | systems: Bridging theory and practice, Information Systems 19 (1994) 3–31. | ||
+ | [14] P. P. Chen, The Entity-Relationship model: Toward a unified view of data, ACM TODS 1 | ||
+ | (1976) 9–36. | ||
+ | � | ||
+ | </pre> |
Latest revision as of 17:56, 30 March 2023
Paper
Paper | |
---|---|
edit | |
description | |
id | Vol-3194/paper10 |
wikidataid | Q117344920→Q117344920 |
title | Conceptually-grounded Mapping Patterns for Virtual Knowledge Graphs |
pdfUrl | https://ceur-ws.org/Vol-3194/paper10.pdf |
dblpUrl | https://dblp.org/rec/conf/sebd/CalvaneseGLM0S22 |
volume | Vol-3194→Vol-3194 |
session | → |
Conceptually-grounded Mapping Patterns for Virtual Knowledge Graphs
Conceptually-grounded Mapping Patterns for Virtual Knowledge Graphs Diego Calvanese1,2 , Avigdor Gal3 , Davide Lanti1 , Marco Montali1 , Alessandro Mosca1 and Roee Shraga4 1 Free-University of Bozen-Bolzano, Bolzano, Italy 2 Umeå University, Umeå, Sweden 3 Technion – Israel Institute of Technology, Haifa, Israel 4 Khoury College of Computer Science, Northeastern University, Boston, Massachusetts Abstract Knowledge Graphs (KGs) have been gaining momentum recently in both academia and industry, due to the flexibility of their data model, allowing one to access and integrate collections of data of different forms. Virtual Knowledge Graphs (VKGs), a variant of KGs originating from the field of Ontology-based Data Access (OBDA), are a promising paradigm for integrating and accessing legacy data sources. The main idea of VKGs is that the KG remains virtual: the end-user interacts with a KG, but queries are reformulated on-the-fly as queries over the data source(s). To enable the paradigm, one needs to define declarative mappings specifying the link between the data sources and the elements in the VKG. In this work, we try to investigate common patterns that arise when specifying such mappings, building on well-established methodologies from the area of conceptual modeling and database design. Keywords Virtual Knowledge Graphs, Ontology-based Data Access, Mapping patterns, Data Integration 1. Introduction Data integration and access to legacy data sources are key challenges for contemporary organi- zations. In the whole spectrum of data integration and access solutions, the approach based on Virtual Knowledge Graphs (VKGs) is gaining momentum, especially when the underlying data sources to be integrated come in the form of relational databases (DBs) [1]. VKGs replace the rigid structure of tables with the flexibility of a graph that incorporates domain knowledge and is kept virtual, eliminating redundancies. A VKG specification consists of three main compo- nents: (i) data sources (in the context of this paper, constituted by relational DBs), where the actual data are stored; (ii) a domain ontology, capturing the relevant concepts, relations, and constraints of the domain of interest; and (iii) a set of mappings, linking the data sources to the ontology. A critical bottleneck in this setting lies in the definition and management of map- pings. In this work, we focus on this issue by proposing a comprehensive catalog of mapping SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy " calvanese@inf.unibz.it (D. Calvanese); avigal@technion.ac.il (A. Gal); lanti@unibz.it (D. Lanti); montali@unibz.it (M. Montali); mosca@unibz.it (A. Mosca); r.shraga@northeastern.edu (R. Shraga) � 0000-0001-5174-9693 (D. Calvanese); 0000-0002-7028-661X (A. Gal); 0000-0003-1097-2965 (D. Lanti); 0000-0002-8021-3430 (M. Montali); 0000-0003-2323-3344 (A. Mosca); 0000-0001-8803-8481 (R. Shraga) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) � Domain Knowledge Conceptual Model DB Schema (E-R) DB Design OWL Encoding Alignment OWL 2 QL Mappings OWL 2 QL DB Ontology Target Ontology VKG Mappings Figure 1: The database and the ontology both stem from common domain knowledge. patterns that emerge when linking data to ontologies. Our catalog is based on the (somehow reasonable) assumption that both the ontology and the DB schema are derived from a conceptual analysis of the domain of interest. The resulting knowledge may stay implicit, or may lead to an explicit representation in the form of a structural conceptual model, which can be represented using well-established notations such as UML, ORM, or E-R. On the one hand, this conceptual model provides the basis for creating a corresponding domain ontology through a series of semantic-preserving transformation steps. On the other hand, it can trigger the design process that finally leads to the deployment of an actual DB. The whole view is depicted in Figure 1. Our catalog is built on well-established methodologies and patterns studied in data manage- ment (e.g., W3C direct mappings (W3C-DM)1 and extensions), data analysis (e.g., algorithms for discovering dependencies), and conceptual modeling (e.g., relational mapping techniques). The idea of mapping patterns is not new. For instance, work in [2] is closely related to ours, as it also introduces a catalog of mapping patterns. However, there are some key differences with our approach. One difference is that we consider KGs (with ontologies), whereas that work focuses on property graphs without an ontology. More importantly, in [2] and in the related literature, patterns are not formalized or grounded to a specific conceptual representation, but are rather informally specified and discussed in a “by-example” fashion. On the contrary, each of our patterns explicitly and non-ambiguously specifies the link between the conceptualization and the DB instance, which is the one arising from applying well-known semantics-preserving transformations studied in the area of DB design. We argue that this foundational grounding paves the way for a variety of VKG design scenarios, depending on which information artifacts are available, and which ones must be produced. For example, our patterns could be used to validate existing mappings, or to automatically generate (i.e., bootstrap) ontology and mappings when only the DB is available. In fact, specific patterns have been proposed also in relation to ontology and mapping bootstrapping, for which a variety of tools and approaches have been developed in the last two decades [3, 4, 5, 6, 7]. The approaches in the literature differ in terms of the overall purposes of bootstrapping (e.g., OBDA, data integration, ontology learning, checking of DB schema constraints using ontology reasoning), the adopted on- tology and mapping languages (e.g., OWL 2 profiles or RDFS as ontology languages, and R2RML or custom languages for the specification of mappings), the different focus on direct and/or com- plex mappings, and the assumed level of automation. The majority of the most recent approaches 1 http://www.w3.org/TR/rdb-direct-mapping/ �Table 1 Semantics of the DL-Liteℛ constructs that involve datatypes. Construct Syntax Element Example Semantics Top domain ⊤𝑉 Δℐ𝑉 Literal ℓ ∈ NL “george” ℓℐ ∈ Δℐ𝑉 Datatype 𝑇𝑖 xsd:int 𝑇𝑖ℐ ⊆ Δℐ𝑉 Data property name 𝑑 ∈ ND hasName 𝑑ℐ ⊆ Δℐ𝑂 × Δℐ𝑉 𝑥 ∈ Δℐ𝑉 | ∃𝑣 ∈ Δℐ𝑉 : (𝑥, 𝑣) ∈ 𝑑ℐ {︀ }︀ Data property domain 𝛿(𝑑) 𝛿(hasName) 𝑣 ∈ Δℐ𝑉 | ∃𝑜 ∈ Δℐ𝑂 : (𝑜, 𝑣) ∈ 𝑑ℐ {︀ }︀ Data property range 𝜌(𝑑) 𝜌(hasName) Data property negation ¬𝑑 ¬hasName Δ𝐼𝑂 × Δℐ𝑉 ∖ 𝑑ℐ closely follow W3C-DM, deriving ontologies that mirror the structure of the input DB. The remainder of the paper is structured as follows: Section 2 introduces the notation and basic notions on VKGs, Section 3 presents (an extract of) our catalog of mapping patterns, and Section 4 concludes the paper. 2. Preliminaries We use the bold font to denote tuples, e.g., x, y, are tuples. When convenient and non- ambiguous, we treat tuples as sets and use set operators on them. We assume familiarity with standard notions and languages from DBs [8], such as SQL or E-R diagrams. A VKG specification is a triple ⟨𝒯 , ℳ, 𝒮⟩ where 𝒯 is an ontology (or TBox), ℳ a set of mappings, and 𝒮 the schema of a DB (with constraints, e.g., primary and foreign keys). In VKGs, the ontology is formulated in OWL 2 QL 2 , but for conciseness we use its Description Logic (DL) counterpart, DL-Liteℛ [9], here slightly enriched to handle datatypes. We fix the following enumerable, pairwise-disjoint sets: NI of individuals, NL of literal values, NC of class names, NP of object property names, and ND of data property names. An OWL 2 QL TBox 𝒯 is a finite set of inclusion axioms of the form 𝐵 ⊑ 𝐶, 𝑞 ⊑ 𝑟, 𝜌(𝑑) ⊑ 𝑓 , or 𝑑 ⊑ 𝑣, where 𝐵, 𝐶 are classes, 𝑞, 𝑟 are object properties, 𝑑 is a data property, 𝑓 is a datatype expression, 𝜌(𝑑) is a data property range expression, and 𝑣 is a data property expression. These are defined according to the following grammar, where 𝐴 ∈ NC, 𝑑 ∈ ND, 𝑝 ∈ NP, 𝛿(𝑑) is a data property domain expression, and 𝑇1 , . . . , 𝑇𝑛 are the RDF datatypes: 𝐵 → 𝐴 | ∃𝑟 | 𝛿(𝑑) 𝑞 → 𝑝 | 𝑝− 𝑓 → ⊤𝐷 | 𝑇1 | · · · | 𝑇𝑛 𝐶 → ⊤𝐶 | 𝐵 | ¬𝐵 𝑟 → 𝑞 | ¬𝑞 𝑣 → 𝑑 | ¬𝑑 In the rules above, ⊤𝐶 denotes the “top” element for concepts and ⊤𝐷 the one for data values (called literals in the RDF terminology). An OWL 2 QL ABox 𝒜 is a finite set of assertions of the form 𝐴(𝑎), 𝑝(𝑎, 𝑏), or 𝑑(𝑎, ℓ), where 𝐴 ∈ NC, 𝑝 ∈ NP, 𝑑 ∈ ND, 𝑎 and 𝑏 are individuals in NI, and ℓ ∈ NL. We call the pair 𝒦 = ⟨𝒯 , 𝒜⟩ an OWL 2 QL Knowledge Graph (KG). 2 http://www.w3.org/TR/owl2-overview/ � Similarly to first-order logic, the semantics of DL-Liteℛ KGs is given through Tarski-style interpretations ℐ = ⟨Δℐ𝑂 , Δℐ𝑉 , ·ℐ ⟩, where Δℐ𝑂 is a non-empty domain of objects, Δℐ𝑣 is a non- empty domain of values, and ·ℐ is an interpretation function. Table 1 reports the semantics for the constructs involving datatypes. The other constructs are defined as in standard DL-Liteℛ [9]. As usual [10], we say that an interpretation ℐ satisfies a KG 𝒦, denoted by ℐ |= 𝒦, if ℐ satisfies the ABox assertions and the inclusion axioms in 𝒦. Mappings. Mappings specify how to populate classes and properties of the ontology with individuals and values constructed from the data in the underlying DB. In other words, mappings provide the ABox that, together with a given TBox, realizes a KG. In VKGs, the adopted language for mappings in real-world systems is R2RML3 , but for conciseness we use here a more convenient abstract notation inspired by the literature [11]: a mapping 𝑚 is a pair of the form ⟨𝑠: 𝑄(x), 𝑡: L(t(x))⟩, where 𝑄(x) is a SQL query with answer variables x over the DB schema 𝒮, called source query, and L(t(x)) is a list of target atoms of the form 𝐶(t1 (x1 )), 𝑝(t1 (x1 ), t2 (x2 )), or 𝑑(t1 (x1 ), t2 (x2 )), where 𝐶 ∈ NC, 𝑝 ∈ NP, 𝑑 ∈ ND, and t1 (x1 ) and t2 (x2 ) are terms that we call templates. We express source queries in relational algebra, omitting answer variables under the assumption that they coincide with the variables used in the target atoms. Intuitively, a template t(x) in the target atom of a mapping corresponds to an R2RML string template4 , and is used to generate an IRI (hence, an object identifier) or an RDF literal, starting from DB values retrieved by the source query in that mapping. For the examples, we use the concrete syntax from the Ontop VKG system [6], in which the source query is expressed in SQL and each target atom is expressed as an RDF triple pattern with templates. The answer variables of the source query occurring in the target atoms are distinguished by enclosing them in curly brackets { · · · }. The following is an example mapping expressed in such syntax: source SELECT ssn FROM person target ex:pers/{ssn} a ex:Person . In the mapping above, the string ex: denotes a URI prefix, e.g., ex:Person is an abbreviation for the URI http://www.example.com/Person. Such mapping, when applied to a DB instance 𝒟 of 𝒮, populates the class ex:Person with IRIs constructed by replacing the answer variable ssn occurring in the target atom with the corresponding values assigned to that variable by the answers to the SQL source query evaluated over 𝒟. For instance, if the source query returns two answers that assign to the answer variable ssn respectively the values 1234 and 5678, then the mapping above produces the following RDF graph (expressed in the Turtle syntax5 ), stating that individuals ex:pers/1234 and ex:pers/5678 are both instances of class ex:Person: ex:pers/1234 a ex:Person . ex:pers/5678 a ex:Person . We denote by 𝒜ℳ(𝒟) the virtual ABox constructed through mappings ℳ from a DB 𝒟. Given a VKG specification ⟨𝒯 , ℳ, 𝒮⟩ and a database instance 𝒟 of 𝒮, the KG 𝒦 = ⟨𝒯 , 𝒜ℳ(𝒟) ⟩ is called the Virtual Knowledge Graph of ⟨𝒯 , ℳ, 𝒮⟩ through 𝒟. The qualifier “virtual” in the 3 http://www.w3.org/TR/r2rml/ 4 https://www.w3.org/TR/r2rml/#dfn-string-template 5 http://www.w3.org/TR/turtle/ �name derives from the fact that the virtual ABox 𝒜ℳ(𝒟) in a VKG setting is not materialized and stored somewhere. Query answering in VKGs, in fact, is carried out through query rewriting and query unfolding techniques [11, 6]: user queries, expressed in SPARQL 6 , get translated on- the-fly into equivalent SQL queries, which then are directly evaluated against the DB. 3. Mapping Patterns In its basic form, a mapping pattern is a quadruple ⟨𝒞, 𝒮, ℳ, 𝒯 ⟩, where 𝒞 is a conceptual model, 𝒮 a database schema, ℳ a set of mappings, and 𝒯 an (OWL 2 QL) ontology. In such pattern, the pair ⟨𝒞, 𝒮⟩ puts into correspondence a conceptual representation with one of its (many) admissible (i.e., formally sound [12, 13]) database schemata, like those prescribed by well-established database modeling methodologies. The pair ⟨ℳ, 𝒯 ⟩, instead, is formed by the DB ontology 𝒯 , which is the OWL 2 QL encoding7 of the conceptual model 𝒞, and the set ℳ of mappings, providing the link between 𝒮 and 𝒯 . The term “DB ontology” refers to an ontology whose concepts and properties reflect the constructs of the conceptual model, mirroring the structure of the relational database, as displayed in Figure 1. Some of the more advanced patterns have a more complex structure, where pairs of conceptual models and/or pairs of database schemata are used in place of 𝒞 and 𝒮, respectively (e.g., the pattern “SHa” falls in this category). These patterns prescribe specific transformations to be applied to an input conceptual (resp., DB) schema, in order to obtain an output conceptual (resp., DB) schema. These output artifacts make explicit the presence of specific structures that are revealed through the application of the pattern itself. These structures can in turn enable further applications of patterns. Presentation Conventions. We show the fragment of the conceptual model that is affected by the pattern in E-R notation (adopting the original notation by Chen [14]). To compactly represent sets of attributes, we use a small diamond in place of the small circle used for single attributes in Chen’s notation. For cardinality constraints we follow the “look-here” convention, that is, the cardinality constraint for a role is placed next to the entity participating in that role. In the DB schema, we use 𝑇 (K, A) to denote a table with name 𝑇 , primary key consisting of the attributes K, and additional attributes A. Given a set U of attributes in 𝑇 , we denote by key𝑇 (U) the fact that U form a key for 𝑇 . Referential integrity constraints (like, e.g., foreign keys) are denoted with arcs, pointing from the referencing attribute(s) to the referenced one(s). For conciseness, we denote sets of the form {𝑜 | condition} as {𝑜}condition . In order to express datatypes for data properties, we introduce two auxiliary functions: a function 𝜏 that, given a DB attribute 𝐴, returns the DB datatype of 𝐴, and a function 𝜇 that associates, to each DB datatype, a corresponding RDF datatype. For the definition of 𝜇, we re-use the Natural Mapping 8 correspondence provided by the R2RML recommendation. As a final note, following the E-R- diagrams convention, we assume a default (1, 1) cardinality on attributes. For such a reason, in the DB schema we assume all attributes to be not nullable by default (using the SQL convention, 6 http://www.w3.org/TR/sparql11-query 7 Modulo the expressivity of the OWL 2 QL language. 8 https://www.w3.org/TR/r2rml/#natural-mapping �Table 2 An extract of our catalog of mapping patterns. Conceptual Model DB Schema Mappings Ontology Schema Entity (SE) ⎧ ⎫ K A 𝑠: 𝑇𝐸 ⎨ 𝛿(𝑑𝐴 ) ⊑ 𝐶𝐸 , ⎬ 𝑇𝐸 (K, A) 𝑡: 𝐶𝐸 (t𝐸 (K)), 𝜌(𝑑𝐴 ) ⊑ 𝜇(𝜏 (𝐴)), E ⎩ 𝐶 ⊑ 𝛿(𝑑 ) {𝑑𝐴 (t𝐸 (K), 𝐴)}𝐴∈K∪A ⎭ 𝐸 𝐴 𝐴∈K∪A In case of optional attributes, for each optional attribute 𝐴′ of 𝐸, add an opt(𝐴′ ) constraint to the DB schema and drop the corresponding inclusion axiom 𝐶𝐸 ⊑ 𝛿(𝑑𝐴′ ) from the ontology. Schema Relationship (SR) KE AE KF AF 𝑇𝐸 (K𝐸 , A𝐸 ) 𝑇𝐹 (K𝐹 , A𝐹 ) 𝑠: 𝑇𝑅 ∃𝑝𝑅 ⊑ 𝐶𝐸 E R F 𝑇𝑅 (K𝑅𝐸 , K𝑅𝐹 ) 𝑡: 𝑝𝑅 (t𝐶𝐸 (K𝑅𝐸 ), t𝐶𝐹 (K𝑅𝐹 )) ∃𝑝− 𝑅 ⊑ 𝐶𝐹 • In case of cardinality (_, 1) on role 𝑅𝐸 (resp., 𝑅𝐹 ), the primary key of 𝑇𝑅 is restricted to the attributes K𝑅𝐸 (resp., K𝑅𝐹 ). In case both roles have cardinality (_, 1), either choice for the primary key is made, and the remaining attributes form a non-primary key in the logical schema. • In case of cardinality (1, _) on role 𝑅𝐸 (resp., 𝑅𝐹 ), the inclusion dependency K𝐸 ⊆ K𝑅𝐸 (resp., K𝐹 ⊆ K𝑅𝐹 ) holds in the schema, and the first (resp., second) inclusion axiom in the ontology holds in both directions. Note that when the maximum cardinality on role 𝑅𝐸 (resp., 𝑅𝐹 ) is 1, the corresponding inclusion dependency is actually a foreign key. Schema Relationship with Identifier Alignment (SRa) KE AE KF UF 𝑇𝐸 (K𝐸 , A𝐸 ) 𝑇𝐹 (K𝐹 , U𝐹 , A𝐹 ) 𝑠: 𝑇𝑅 ⋊⋉U𝑅𝐹 =U𝐹 𝑇𝐹 ∃𝑝𝑅 ⊑ 𝐶𝐸 E R F 𝑇𝑅 (K𝑅𝐸 , U𝑅𝐹 ) key𝑇𝐹 (U𝐹 ) 𝑡: 𝑝𝑅 (t𝐶𝐸 (K𝑅𝐸 ), t𝐶𝐹 (K𝐹 )) ∃𝑝− 𝑅 ⊑ 𝐶𝐹 AF Schema Hierarchy with Identifier Alignment (SHa) KE AE E 𝑇𝐸 (K𝐸 , A𝐸 ) key𝑇𝐹 (K𝐹 𝐸 ) KF 𝐶𝐹 ⊑ 𝐶𝐸 𝑇𝐹 (K𝐹 , K𝐹 𝐸 , A𝐹 ) 𝑠: 𝑉𝐹 ⎧ ⎫ F AF ⎨ 𝛿(𝑑𝐴 ) ⊑ 𝐶𝐹 , 𝑡: 𝐶𝐹 (t𝐶𝐸 (K𝐹 𝐸 )), ⎬ KE AE 𝜌(𝑑𝐴 ) ⊑ 𝜇(𝜏 (𝐴)), 𝑇𝐸 (K𝐸 , A𝐸 ) key𝑉𝐹 (K𝐹 ) {𝑑𝐴 (t𝐶𝐸 (K𝐹 𝐸 ), 𝐴)}𝐴∈K𝐹 ∪A𝐹 ⎩ 𝐶 ⊑ 𝛿(𝑑 ) ⎭ 𝐹 𝐴 𝐴∈K ∪A E 𝐹 𝐹 𝑉𝐹 (K𝐹 , K𝐹 𝐸 , A𝐹 ) = 𝑇𝐹 KF F AF In this pattern, the “alignment” is meant to align the primary identifier used in the child entity to the primary identifier used in the parent entity. The other two possiblities for applying the pattern are: • the foreign key in the child entity is the primary key of that entity, and references a non-primary key of the parent entity; • the foreign key in the child entity is a non-primary key of that entity, and references a non-primary key of the parent entity. We depict here the most common scenario, where the foreign key points to the primary key of the parent entity. Observe that this pattern requires a change in the conceptual model (essentially keeping track of the attributes used for identifying the objects of the subclass). declared as “NOT NULL”). An optional attribute 𝐴 is instead denoted by adding opt(𝐴) to the DB schema. Such notation extends in the natural way to a set A of attributes. Pattern Catalog. Table 2 shows an excerpt of our patterns, which we discuss in detail here. Schema Entity (SE). This fundamental pattern describes the correspondence between an entity with a primary identifier and attributes in the DB schema, and a class and data properties in the ontology. The entity is expressed in the DB schema through a single table 𝑇𝐸 with primary key K and other attributes A, as it is the norm in sound DB design practices. The mappings column explains how 𝑇𝐸 is mapped into a corresponding class 𝐶𝐸 . The primary key of 𝑇𝐸 is employed to construct the IRIs of the objects that are instances of 𝐶𝐸 , using a template t𝐸 specific for that entity. Each relevant attribute of 𝑇𝐸 is mapped to a data property of 𝐶𝐸 , with suitable domain and range axioms. A mandatory participation constraint is added to each data property corresponding to a mandatory attribute. Example: A client registry table containing SSNs of clients, together with their name as an additional attribute, is mapped to a Client class using the SSN to construct its objects. In �addition, the SSN and name are mapped to two corresponding data properties. Schema Relationship (SR). This pattern describes the correspondence between a binary relation- ship without attributes and an OWL 2 QL object property, for the case where such relationship is represented in the DB as a separate (usually, “many-to-many”) table. This pattern considers three tables 𝑇𝑅 , 𝑇𝐸 , and 𝑇𝐹 , for which the set of columns in 𝑇𝑅 is partitioned into two parts KRE and KRF that are foreign keys to 𝑇𝐸 and 𝑇𝐹 , respectively. The identifier of 𝑇𝑅 depends on the role cardinalities in the E-R model. The pattern captures how 𝑇𝑅 is mapped to an object property 𝑝𝑅 , using the two parts KRE and KRF of the partition to construct respectively the subject and the object of the triples in 𝑝𝑅 . The templates t𝐶𝐸 and t𝐶𝐹 must be those respectively used for building instances of classes 𝐶𝐸 corresponding to 𝑇𝐸 and 𝐶𝐹 corresponding to 𝑇𝐹 . Example: An additional table in the client registry stores the addresses of each client, and has a foreign key to a table with locations. The former table is mapped to an address object property, for which the ontology asserts that the domain is the class Person and the range an additional class Location, which corresponds to the latter table. Schema Relationship with Identifier Alignment (SRa). This pattern is similar to pattern SR, but it comes with a modifier a, indicating that the pattern can be applied after the identifiers involved in the relationship have been aligned. The alignment is necessary because the foreign key in 𝑇𝑅 does not refer to the primary key K𝐹 of 𝑇𝐹 , but to an alternative key U𝐹 . Since the instances of the class 𝐶𝐹 corresponding to 𝑇𝐹 are constructed using the primary key K𝐹 of 𝑇𝐹 (cf. pattern SE), also the pairs that populate 𝑝𝑅 should refer in their object position to that primary key, which can only be retrieved via a join between 𝑇𝑅 and 𝑇𝐹 on the key U𝐹 . Example: The primary key of the table with locations is not given by the city and street, which are used in the table that relates clients to their addresses, but is given by the latitude and longitude of locations. Schema Hierarchy with Identifier Alignment (SHa). This patterns handles the case where a hierarchy is specified and the child entity uses a primary identifier different from the one in the parent entity. In this situation, the foreign-key constraint can come in three different variants. In the depicted one, the foreign key in 𝑇𝐹 is over a non-primary key KFE . The objects for 𝐶𝐹 have to be built out of KFE , rather than out of the primary key of 𝑇𝐹 . For this purpose, the pattern creates a view 𝑉𝐹 identical to 𝑇𝐹 , except that KFE is the primary key. Also the foreign key relations are preserved. Such view might enable further applications of patterns. Example: An ISA relation between entities Student and Person. Students are identified by their matriculation number, whereas persons are identified by their SSN. 4. Conclusions and Future Work In this work, we have identified and formally specified a number of mapping patterns emerging when linking DBs to ontologies in a typical VKG setting. Our patterns are grounded in well- established practices of DB design, and render explicit the connection between the conceptual model, the DB schema, and the ontology. We envision that the organization in patterns can enable a number of relevant tasks, notably mapping bootstrapping for incomplete VKGs. � This work is only a first step, with respect to both categorization of patterns, and their actual use. Regarding the former, we are currently extending this initial catalog with more advanced “data-driven” patterns, which are patterns where the data component needs to be taken into account. Regarding the latter, we are investigating solutions to specific problems that need to be addressed when setting-up a VKG scenario, like the problem of mapping bootstrapping. Acknowledgments This research has been partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, by the Italian Basic Research (PRIN) project HOPE, by the EU H2020 project INODE (grant agreement 863410), and by the project MENS, funded through the 4th Call for Research of the Autonomous Province of Bolzano (IN2219). References [1] G. Xiao, L. Ding, B. Cogrel, D. Calvanese, Virtual Knowledge Graphs: An overview of systems and use cases, Data Intelligence 1 (2019) 201–223. [2] J. Sequeda, O. Lassila, Designing and Building Enterprise Knowledge Graphs, Morgan & Claypool Publishers, 2021. [3] L. F. de Medeiros, F. Priyatna, Ó. Corcho, MIRROR: Automatic R2RML mapping generation from relational databases, in: Proc. ICWE, volume 9114 of LNCS, Springer, 2015, pp. 326– 343. [4] E. Jiménez-Ruiz, E. Kharlamov, D. Zheleznyakov, I. Horrocks, C. Pinkel, M. G. Skjæveland, E. Thorstensen, J. Mora, BootOX: Practical mapping of RDBs to OWL 2, in: Proc. ISWC, volume 9367 of LNCS, Springer, 2015, pp. 113–132. [5] C. Pinkel, C. Binnig, E. Kharlamov, P. Haase, IncMap: Pay as you go matching of relational schemata to OWL ontologies., in: Proc. 8th Int. Workshop on Ontology Matching (OM), volume 1111 of CEUR, CEUR-WS.org, 2013, pp. 37–48. [6] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez- Muro, G. Xiao, Ontop: Answering SPARQL queries over relational databases, Semantic Web J. 8 (2017) 471–487. [7] J. F. Sequeda, D. P. Miranker, Ultrawrap Mapper: A semi-automatic relational database to RDF (RDB2RDF) mapping tool, in: Proc. ISWC Posters & Demonstrations Track, volume 1486 of CEUR, CEUR-WS.org, 2015. [8] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison Wesley, 1995. [9] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and efficient query answering in description logics: The DL-Lite family, JAR 39 (2007) 385–429. [10] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The De- scription Logic Handbook: Theory, Implementation and Applications, 2nd ed., Cambridge University Press, 2007. [11] A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, R. Rosati, Linking data to ontologies, J. on Data Semantics 10 (2008) 133–173. �[12] R. Hull, Relative information capacity of simple relational database schemas, SIAM J. on Computing 15 (1986) 856–886. [13] R. J. Miller, Y. E. Ioannidis, R. Ramakrishnan, Schema equivalence in heterogeneous systems: Bridging theory and practice, Information Systems 19 (1994) 3–31. [14] P. P. Chen, The Entity-Relationship model: Toward a unified view of data, ACM TODS 1 (1976) 9–36. �