Difference between revisions of "Vol-3194/paper35"

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search
(edited by wikiedit)
 
(edited by wikiedit)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
+
=Paper=
 
{{Paper
 
{{Paper
 +
|id=Vol-3194/paper35
 +
|storemode=property
 +
|title=Reasoning in Warded Datalog+/- with Harmful Joins
 +
|pdfUrl=https://ceur-ws.org/Vol-3194/paper35.pdf
 +
|volume=Vol-3194
 +
|authors=Teodoro Baldazzi,Luigi Bellomarini,Emanuel Sallinger,Paolo Atzeni
 +
|dblpUrl=https://dblp.org/rec/conf/sebd/BaldazziBSA22
 
|wikidataid=Q117344904
 
|wikidataid=Q117344904
 
}}
 
}}
 +
==Reasoning in Warded Datalog+/- with Harmful Joins==
 +
<pdf width="1500px">https://ceur-ws.org/Vol-3194/paper35.pdf</pdf>
 +
<pre>
 +
Reasoning in Warded Datalog+/- with Harmful Joins
 +
Teodoro Baldazzi1 , Luigi Bellomarini2 , Emanuel Sallinger3,4 and Paolo Atzeni1
 +
1
 +
  Università Roma Tre, Department of Computer Science and Engineering, Rome, Italy
 +
2
 +
  Banca d’Italia, Italy
 +
3
 +
  TU Wien, Faculty of Informatics, Vienna, Austria
 +
4
 +
  University of Oxford, Department of Computer Science, Oxford, UK
 +
 +
 +
                                        Abstract
 +
                                        Warded Datalog+/- is a powerful member of the Datalog+/- family of logic languages, featuring full
 +
                                        support for recursion and existential quantification. As a result of the promising trade-off between
 +
                                        expressive power and data complexity offered, it recently rose as a relevant candidate for ontological
 +
                                        reasoning on large knowledge graphs and is employed as logic core of the Vadalog system, a well-known
 +
                                        state-of-the-art reasoner. To achieve decidability and data tractability in practice over recursive settings,
 +
                                        reasoners such as Vadalog adopt specialized strategies that control the effects of recursion and ensure
 +
                                        reasoning termination with small memory footprint. However, exploiting the theoretical underpinnings
 +
                                        of the Warded fragment, to enable these strategies, requires the settings not to contain “harmful” joins,
 +
                                        i.e., on variables affected by existential quantification. To support reasoning decidability and the full
 +
                                        expressive power of the language, we developed the Harmful Join Elimination, an algorithm that removes
 +
                                        such joins while preserving the correctness of the task, and we integrated it into the Vadalog system.
 +
 +
                                        Keywords
 +
                                        Datalog, Vadalog, ontological reasoning, existential quantification, harmful joins
 +
 +
 +
 +
 +
1. Introduction
 +
Recent years’ widespread interest towards querying and exploiting large amounts of data
 +
in the form of knowledge graphs (KGs) has led to the rising development and adoption of
 +
intelligent systems that manage such extensional knowledge and infer new intensional one
 +
via efficient ontological reasoning mechanisms [1]. To achieve this, modern reasoning and
 +
KG navigation applications require to employ powerful formalisms and logic languages for
 +
knowledge representation [2], capable of providing full support for recursion and existential
 +
quantification as well as sustaining decidability and polynomial data complexity of the task [3].
 +
  Datalog± [4, 5, 6, 7] is one of the most commonly adopted families of logic languages
 +
(technically, fragments) for reasoning on KGs [6]. Among them, Warded Datalog± [8] effectively
 +
covers the above requirements. Indeed, it encompasses both a high expressiveness, capturing
 +
plain Datalog as well as SPARQL queries under OWL 2 QL entailment regime and set semantics,
 +
and a simple syntax that enable powerful knowledge-modeling. It also offers a very good trade-
 +
off with computational complexity and captures PTIME for the reasoning [9]. Its semantics is
 +
defined via the chase [10], an algorithmic tool that takes as input a database 𝐷 and a set Σ of
 +
SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
 +
$ teodoro.baldazzi@uniroma3.it (T. Baldazzi); luigi.bellomarini@bancaditalia.it (L. Bellomarini);
 +
sallinger@dbai.tuwien.ac.at (E. Sallinger); paolo.atzeni@uniroma3.it (P. Atzeni)
 +
                                      © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 +
    CEUR
 +
    Workshop
 +
    Proceedings
 +
                  http://ceur-ws.org
 +
                  ISSN 1613-0073
 +
                                      CEUR Workshop Proceedings (CEUR-WS.org)
 +
�rules, and adds new tuples to 𝐷 until Σ is satisfied: such tuples may contain freshly generated
 +
symbols 𝜈𝑖 (technically, labelled nulls) that act as placeholders for the existentially quantified
 +
variables [6]. The Warded fragment is employed in the Vadalog system [11], a state-of-the-art
 +
reasoner that allows to perform ontological reasoning tasks in complex scenarios.
 +
  While such favourable characteristics of Warded Datalog± bode well for efficient imple-
 +
mentations, the interplay between recursion and existential quantification may lead to the
 +
generation of infinite labelled nulls in the chase, causing the procedure not to terminate and
 +
thus inhibiting decidability of the reasoning task in practice [5]. To properly control such
 +
interactions, reasoners resort to specific techniques known as termination strategies. Specif-
 +
ically, the Vadalog system employs the isomorphism termination strategy, which consists in
 +
suppressing isomorphic copies of previously generated facts (i.e., same name, same constants
 +
in same positions and bijection between labelled nulls), hence the chase steps starting from
 +
them are not performed and the descending facts are not generated. The correctness of such
 +
strategy is corroborated by the reasoning boundedness property of the fragment, which states
 +
that facts derived from isomorphic origins are isomorphic, thus uninformative for the reasoning
 +
task [3]. However, as a necessary condition to exploit this property, the set of Warded rules in
 +
the scenario is required to be in a harmless form, i.e., without joins between variables affected
 +
by existential quantification. Indeed, during the chase rules with such harmful joins could
 +
activate on labelled nulls. Therefore, suppressing isomorphic facts that carry these nulls could
 +
inhibit rule activation and affect reasoning correctness. On the other hand, avoiding the use of
 +
these joins affects the expressive power of the fragment. Consider the following example.
 +
Example 1. Company merger scenario modeled with a set Σ of Warded Datalog± rules.
 +
                                      Company(x) → ∃c CEO(x, c)                              (𝛼)
 +
                                Merges(x, y), CEO(x, c) → CEO(y, c)                            (𝛽)
 +
                              Corp(x, y) → ∃c CEO(x, c), CEO(y, c)                            (𝛾)
 +
                                  CEO(x, c), CEO(y, c) → Corp(x, y)                            (𝜌)
 +
 +
For each company 𝑥 there exists a CEO 𝑐 (rule 𝛼). If 𝑥 merges with a company 𝑦, 𝑐 also becomes
 +
CEO of 𝑦 (rule 𝛽). If 𝑥 and 𝑦 have a common CEO, they are in the same corporation (rule 𝜌) and vice
 +
versa (rule 𝛾). Consider the database instance 𝐷 = {Company(Hsb), Company(Iba), Merges(Hsb,
 +
Iba)} and the query 𝑄: “what are all the entailed Corporations?” as ontological reasoning task.
 +
It can be observed that the set of corporations is finite, whereas the chase does not terminate.
 +
By following the procedure without termination strategy, we first generate CEO(Hsb,𝜈0 ) and
 +
CEO(Iba,𝜈1 ) by activating 𝛼 from the facts Company(Hsb) and Company(Iba), respectively.
 +
Next, we obtain CEO(Iba,𝜈0 ) from 𝛽, Corp(Hsb,Iba) via the join on 𝜈0 in 𝜌, then CEO(Hsb,𝜈2 )
 +
and CEO(Iba,𝜈2 ) by activating 𝛾, 𝛽 and so on. Due to the existential quantification      in rule 𝛾
 +
and its interplay with the recursions in rules 𝛽 and 𝜌, an infinite set 𝑖=3,... {CEO(Hsb, 𝜈𝑖 ),
 +
                                                                          ⋃︀
 +
CEO(Iba, 𝜈𝑖 )} is generated. On the other hand, employing the isomorphism termination strategy
 +
is not feasible either, due to the harmful join in rule 𝜌. Indeed, CEO(Iba,𝜈0 ) is isomorphic with
 +
CEO(Iba,𝜈1 ), yet its pruning would prevent the join in 𝜌 with CEO(Hsb,𝜈0 ) from generating
 +
Corp(Hsb,Iba) and the query in Example 1 from being answered correctly.
 +
  In this discussion paper, we illustrate how we enabled reasoning termination and decidability
 +
in practice over Warded Datalog± programs with harmful joins while preserving the expressive
 +
�power of the fragment. First, we introduce the disarmament problem, which consists in
 +
rewriting such programs into a version without harmful joins (namely, in the newly defined
 +
fragment Harmless Warded Datalog± ) that is equivalent with respect to the chase. Then, we
 +
present Harmful Join Elimination (HJE) [12, 13], the first, to the best of our knowledge,
 +
rewriting technique to solve the disarmament problem. Finally, we discuss HJE integration into
 +
the Vadalog system and its reasoning application over the scenario in Example 1.
 +
Related Work. The importance of achieving reasoning decidability in Datalog± fragments
 +
and handling the interplay between recursion and existential quantification acted as catalysts
 +
for the development of novel approaches for reasoning termination [14, 7, 6]. More in general,
 +
HJE belongs to the class of methodologies for Datalog rewriting. Among them, we mention its
 +
conversion into fragments such as Guarded [15], Linear [16], and Disjunctive [17]. Furthermore,
 +
by interpreting the rules as queries, distinct techniques were proposed to rewrite Datalog±
 +
programs with existentials [18] from Regular Path Queries [19] and Description Logics [20].
 +
Overview. The remainder of this paper is organized as follows. In Section 2, we provide an
 +
overview of Warded Datalog± . In Section 3, we illustrate the HJE algorithm as well as its
 +
application over Example 1 in the Vadalog system. We draw our conclusions in Section 4.
 +
 +
 +
2. Syntax and Semantics of Warded Datalog±
 +
A Warded Datalog± program consists of a set of facts and rules. An existential rule is a first-order
 +
sentence ∀𝑥  ¯ ∀𝑦¯(𝜙(𝑥              ¯ , 𝑧¯)), where 𝜙 (the body) and 𝜓 (the head) are conjunctions of
 +
                      ¯ , 𝑦¯)→∃𝑧¯ 𝜓(𝑥
 +
atoms, over the respective predicates, with constants and variables. For brevity, we may omit
 +
quantifiers and denote conjunction by comma. Let Σ be a set of rules and 𝑝[𝑖] a position (i.e.,
 +
the 𝑖-th term of a predicate 𝑝 with arity 𝑘, where 𝑖 = 1, . . . , 𝑘). We define 𝑝[𝑖] as affected if (i) 𝑝
 +
appears in a rule in Σ with an existentially quantified variable (∃-variable) in the 𝑖-th term or,
 +
(ii) there is a rule in Σ such that a universally quantified variable (∀-variable) is only in affected
 +
body positions and in 𝑝[𝑖] in the head. A ∀-variable 𝑥 is harmful, wrt a rule 𝜌 in Σ, if 𝑥 appears
 +
only in affected positions in 𝜌, otherwise it is harmless. In Example 1, 𝛼 and 𝛾 are existential
 +
rules, while 𝜌 is a harmful join rule, i.e., a rule such that a harmful variable is involved in a join
 +
(namely, harmful join). If the harmful variable is in head(𝜌), it is dangerous [1].
 +
    Chase-based procedures enforce the satisfaction of Σ over a database 𝐷, incrementally
 +
expanding 𝐷 into new instances 𝐼 with facts derived from the application of the rules, until Σ
 +
is satisfied (𝐼 = 𝑐ℎ𝑎𝑠𝑒(𝐷, Σ)). Consider an instance 𝐼 ′ ⊇ 𝐼 and a rule 𝜎 : 𝜙(𝑥        ¯ , 𝑦¯)→∃𝑧¯⟨︀𝜓(𝑥 ¯⟩︀, 𝑧¯)
 +
∈ Σ. In the version of the chase we refer to (known as oblivious [5]), a chase step 𝜎,ℎ is
 +
applicable to 𝐼 ′ if there exists a homomorphism ℎ that maps the atoms of 𝜙(𝑥            ¯ , 𝑦¯) to facts of
 +
𝐼 (i.e., ℎ(𝜙(𝑥¯ , 𝑦¯)) ⊆ 𝐼). When the chase step is applicable, the atom ℎ′ (𝜓(𝑥  ¯ , 𝑧¯)) is added to 𝐼 ′ ,
 +
where ℎ′ is obtained by extending ℎ so that ℎ′ (𝑧𝑖 ) is a fresh labelled null, ∀ 𝑧𝑖 ∈ 𝑧¯. The chase
 +
graph 𝒢(𝐷, Σ) is the directed graph with the facts from chase(𝐷, Σ) as nodes and an edge from
 +
a node 𝑛 to a node 𝑚 if 𝑚 is obtained from 𝑛 (and possibly other facts) via a chase step [7].
 +
    Given a pair 𝑄 = (Σ, Ans), where Ans is an n-ary predicate, the evaluation of 𝑄 over 𝐷
 +
is the set of tuples 𝑄(𝐷, Σ) = {𝑡¯ ∈ dom(𝐷)𝑛 | Ans(𝑡¯) ∈ chase(𝐷, Σ)}, where ¯𝑡 is a tuple of
 +
constants. A reasoning task consists in finding an instance 𝐽 st: (i) ¯𝑡 ∈ 𝐽 iff Ans(𝑡¯) ∈ 𝑄(𝐷, Σ);
 +
and (ii) for every other 𝐽 ′ st ¯𝑡 ∈ 𝐽 ′ iff ¯𝑡 ∈ 𝑄(𝐷, Σ), there is a homomorphism from 𝐽 to 𝐽 ′ [11].
 +
�3. Enabling Reasoning in Warded Datalog±
 +
To achieve reasoning termination and decidability over recursive Warded Datalog± settings,
 +
especially in the presence of existential quantification, the oblivious chase is enriched with the
 +
isomorphism termination strategy, which acts as a firing condition that limits the activation
 +
of applicable chase steps [11]. With reference
 +
                                          ⟨︀ ⟩︀ to the definitions in Section 2, such isomorphic
 +
chase variant causes an applicable step 𝜎,ℎ to activate, wrt 𝐼 ′ ⊇ 𝐼, if there is no isomorphism
 +
of h(head(𝜎)) with 𝐼 ′ . Therefore, given two isomorphic facts 𝑓 and 𝑓 ′ , only 𝑓 is explored in the
 +
isomorphic chase, whereas the descending portions of the chase graph rooted in 𝑓 ′ are pruned
 +
because isomorphic to the ones rooted in 𝑓 , thus uninformative for the reasoning task.
 +
Harmless Warded Datalog± . However, as we have introduced, in order to exploit such rea-
 +
soning boundedness property and sustain decidability of the task while preserving correctness,
 +
Warded programs with recursion and existentials must not contain harmful join rules [3, Theo-
 +
rem 2], that is, they belong to Harmless Warded Datalog± [12]. Without loss of generality (as
 +
more complex joins can be broken into multiple steps [11]), a harmful join rule is a warded rule
 +
𝜌 : 𝐴(𝑥1 , 𝑦1 , ℎ), 𝐵(𝑥2 , 𝑦2 , ℎ) → ∃𝑧 𝐶(𝑥, 𝑧), where 𝐴, 𝐵 and 𝐶 are atoms, 𝐴[3] and 𝐵[3] are
 +
affected positions, 𝑥1 , 𝑥2 ⊆ 𝑥, 𝑦1 , 𝑦2 ⊆ 𝑦 are disjoint tuples of harmless variables or constants
 +
and ℎ is a harmful variable. A set of rules ∈ Harmless Warded Datalog± if: (1) it is warded, i.e.,
 +
all the dangerous variables in its rules appear in a single body atom (the ward), which only shares
 +
harmless variables with the rest of the body; and (2) it does not contain harmful join rules.
 +
Disarming the Warded Fragment. While in presence of programs without harmful joins the
 +
isomorphic chase can be employed without affecting the correctness nor the computational
 +
properties of the reasoning, restricting the Warded fragment to its Harmless Warded counterpart
 +
affects the expressive power available to reasoners such as the Vadalog system implementing
 +
it. With the goal of preventing such syntactic limitations, we developed a technique to solve
 +
the disarmament problem, that is, to rewrite a set Σ of Warded Datalog± rules in input to the
 +
reasoners into an equivalent set Σ′ of Harmless Warded rules. In this context, two sets of rules
 +
are considered equivalent if they have the same meaning with respect to the chase [16], i.e.,
 +
chase(𝐷, Σ) = chase(𝐷, Σ′ ) modulo fact isomorphism for each database 𝐷. Such technique,
 +
which we named Harmful Join Elimination (HJE), also allowed us to operationally prove that
 +
for each Warded set there exists an equivalent Harmless Warded one [13].
 +
    Intuitively, the HJE algorithm replaces each harmful join rule 𝜌 with a set of harmless rules that
 +
cover the generation of all the facts derived from activating 𝜌, thus preserving correctness. To
 +
achieve this, it operates by considering the rules involved in the propagation of the affectedness
 +
(i.e., the labelled nulls in the chase) from the existentials to the harmful join variables in 𝜌.
 +
 +
Definition 1 (Causes of Affectedness). Let 𝜌 ∈ Σ be a harmful join rule and let 𝐻 ∈ {A,B} be an
 +
atom in the body of 𝜌. We define causes of affectedness as the sequences of rules Γ𝐻𝑖 = [𝜎𝑠 , . . . , 𝜎1 ]
 +
(𝑠 < |Σ|, 𝑖 ≥ 1) ∈ Σ, where: (i) 𝜎1 : 𝐻1 (𝑥, 𝑦1 ), 𝑅1 →∃ℎ 𝐻2 (𝑥, 𝑦2 , ℎ) is a direct cause, i.e., an exis-
 +
tential rule that causes a position to be affected; and (ii) 𝜎𝑘 : 𝐻𝑘 (𝑥, 𝑦𝑘 , ℎ), 𝑅𝑘 →𝐻𝑘+1 (𝑥, 𝑦𝑘+1 , ℎ),
 +
1 < 𝑘 ≤ 𝑠, are indirect causes, i.e., rules that propagate the affectedness from 𝜎1 to 𝜌, such that
 +
𝐻𝑠+1 = 𝐻. 𝐻1 , . . . , 𝐻𝑠 are atoms, 𝑅1 , . . . , 𝑅𝑠 are atoms or conjunctions of atoms not containing
 +
ℎ (as the rules are Warded). Let 𝑋𝑖𝑗 = Γ𝐴𝑖 ⌢ Γ𝐵𝑗 be the concatenation of the sequences Γ𝐴𝑖 and
 +
Γ𝐵𝑗 with the same direct cause: the causes in 𝑋𝑖𝑗 are labelled after the sequence they belong to.
 +
�With reference to Example 1, the rules 𝛼 and 𝛾 are direct causes of affectedness, whereas 𝛽 is
 +
an indirect one. The sequences of causes for the atom CEO𝑘 (k ∈ {1,2}, in order of appearance
 +
in 𝜌) are: ΓCEO𝑘 1 = [𝛼], ΓCEO𝑘 2 = [𝛽, 𝛼], ΓCEO𝑘 3 = [𝛾], ΓCEO𝑘 4 = [𝛽, 𝛾]. For instance, 𝑋21 =
 +
[𝛽ΓCEO1 2 , 𝛼ΓCEO1 2 , 𝛼ΓCEO2 1 ] is a concatenation, as ΓCEO1 2 ,ΓCEO2 1 contain the same direct cause 𝛼.
 +
  To determine how the propagated nulls activating the harmful join in 𝜌 affect the meaning
 +
in the chase, and to consequently build proper harmless rules, we compose 𝜌 along all its
 +
concatenations 𝑋𝑖𝑗 of causes of affectedness via the unfolding and folding operations [16].
 +
 +
Definition 2 (Unfolding and Folding). Let 𝜌 be a rule 𝐴, 𝐵→𝐶, where 𝐴 and 𝐶 are atoms and
 +
𝐵 is an atom or a conjunction of atoms, and let 𝜎 be a rule 𝑅→𝐴′ , where 𝐴′ is an atom and 𝑅
 +
an atom or a conjunction of atoms. Let 𝐴′ be unifiable with 𝐴 by substitution 𝜃. The result of
 +
unfolding 𝜌 at 𝐴 with 𝜎 is the rule 𝜏 : (𝐵, 𝑅→𝐶)𝜃. If the head of 𝜎 contains an ∃-variable ℎ, it
 +
replaces ℎ with a Skolem atom 𝑓ℎ𝜎 in 𝜏 , where 𝑓 is an injective, deterministic and range disjoint
 +
function that calculates the values for ∃-variables, to control the identity of labelled nulls. Now, let
 +
𝜎 ′ be a rule 𝐵 ′ →𝑅, where 𝑅 is an atom and 𝐵 ′ is an atom or a conjunction of atoms. Let 𝐵 ′ be
 +
unifiable with 𝐵 by substitution 𝜃′ . The result of folding 𝜌 into 𝜎 ′ is the rule 𝜐: (𝐴, 𝑅→𝐶)𝜃′ .
 +
 +
The results of the compositions for each harmful join rule are represented via the      ⟨︀ harmful
 +
unfolding tree (hu-tree). Apart from the more technical side [12], the hu-tree 𝑇 for Σ, 𝜌 can
 +
                                                                                                ⟩︀
 +
 +
be defined as a rule-labelled tree-like structure where: (i) the root is labelled by 𝜌; (ii) for each
 +
𝑋𝑖𝑗 , there exists a root-to-leaf path 𝑇𝑖𝑗 whose nodes are labelled by the result of unfolding their
 +
parent nodes with the causes in 𝑋𝑖𝑗 (in order of appearance); (iii) for each 𝜎𝑘 ∈ 𝑋𝑖𝑗 involved in
 +
a recursion, there exists a leaf labelled by result of unfolding its parent node 𝜇 with 𝜎𝑘 and then
 +
folding it into 𝜇. By definition of unfolding and folding, the leaves in 𝑇 are harmless rules that
 +
cover the generation of all the facts derived in the chase from activating 𝜌 on labelled nulls [13].
 +
                                                    CEO(x,c),CEO(y,c)⟶Corp(x,y)
 +
                                                α                        β                        γ
 +
 +
                                            α            β            α          γ            β            γ
 +
 +
                                      τ1                                                                        τ8
 +
                                                    α        α            β  β            γ        γ
 +
 +
                                                τ2      τ3                                    τ6      τ7
 +
                                                                          α  γ
 +
            βГCEO 2                                              τ4                  τ5                      Corp(v1,v2),Merges(v3,y),
 +
                  1
 +
                                                                                                              CEO(v3,c),fcγ(v1)⟶Corp(v1,y)
 +
          Merges(v1,x),CEO(v1,c),
 +
            CEO(y,c)⟶Corp(x,y)                                                                                                βГCEO 4
 +
                                                                                                                                  2
 +
            αГCEO 2
 +
                  1
 +
                                                                                                        Corp(v1,v2),Merges(v3,y),Merges(v4,v3),
 +
    Merges(v1,v2),Company(v1),CEO(y,c),                                                                      CEO(v4,c),fcγ(v1)⟶Corp(v1,y)
 +
            fcα(v1)⟶Corp(v2,y)
 +
            αГCEO 1
 +
                  2
 +
                                                                                                        Corp(v1,v2),Merges(v2,v3)⟶Corp(v1,v3)
 +
  Merges(v1,v2),Company(v1),Company(v3),
 +
        fcα(v1),fcα(v3)⟶Corp(v2,v3)
 +
 +
 +
                      ⟨︀ ⟩︀
 +
Figure 1: Hu-tree 𝑇 for Σ,𝜌 of Example 1, with unfolding (blue) and folding (green) compositions.
 +
�Figure 1 shows the hu-tree 𝑇 of Σ,𝜌 in Example 1. Specifically, the figure in the middle depicts
 +
                                  ⟨︀ ⟩︀
 +
 +
the sequences of unfoldings (blue arrows) and foldings (green arrows) with the causes (here
 +
not labelled), from 𝜌 to the leaves with the resulting harmless rules; the left figure illustrates
 +
more in detail the root-to-leaf path 𝑇21 obtained by unfolding the causes in 𝑋21 ; the right figure
 +
provides an application of folding over the recursive cause 𝛽ΓCEO2 4 . Note that 𝑇 does not contain
 +
cycles, that is, only the plain arrows (not the dotted ones) in the figure are edges of the hu-tree.
 +
HJE Algorithm. Given a Warded Datalog± set Σ with harmful join rules 𝜌, the HJE algorithm
 +
produces a new equivalent set Σ′ = HJE(Σ). It can be divided into two main phases. We refer
 +
to the extended versions of the work [12, 13] for an in-depth discussion of the algorithm.
 +
  In the back-composition phase (Algorithm 1), HJE first derives for each 𝜌 all the causes of
 +
affectedness and the concatenations 𝑋𝑖𝑗 (line 5). Then, it iteratively builds the hu-tree 𝑇 via
 +
unfolding (line 13) and, in case of recursive causes, folding (line 17) along the causes in each
 +
𝑋𝑖𝑗 , in order of appearance, until all the root-to-leaf paths have been created: it employs the
 +
maximum distance from harmlessness (mdh), a value that corresponds to the highest cardinality
 +
among the 𝑋𝑖𝑗 , i.e., the height of 𝑇 . The rules labelling the resulting leaves are then subject to
 +
an overall cleanup and deduplication (line 19). Rules that never activate are dropped. Also, if the
 +
functions of the Skolem atoms in a rule (derived from unfolding a direct cause in 𝑇 ) respect
 +
injectivity and range disjointness, they are simplified, otherwise the rule is dropped.
 +
  The grounding phase occurs if Σ contains rules 𝜋 that are not causes of affectedness and that
 +
propagate ground values from the database to the harmful join in 𝜌. In this case, to cover the
 +
activation of 𝜌 on such values, the Dom(ℎ) [3] rules are employed (below for A, corresponding
 +
ones for B), which force the harmful join variables to bind only to constants in the domain.
 +
 +
                                              Dom(h), A(x, y, h) → A′ (x, y, h)                                            (𝛿1 )
 +
                                                          A (x, y, h) → A(x, y, h)
 +
                                                            ′
 +
                                                                                                                            (𝛿2 )
 +
                                        A (x1 , y1 , h), B(x2 , y2 , h) → ∃z C(𝑥, z)
 +
                                          ′
 +
                                                                                                                            (𝛿3 )
 +
 +
Algorithm 1 Back-Composition in Harmful Join Elimination.
 +
1: function back-composition(Σ)
 +
2:    P ← all harmful join rules in Σ; Σ′ = Σ
 +
3:    for 𝜌 in P do
 +
4:        𝑇 ← empty hu-tree with 𝜌 as root
 +
5:        𝒳𝜌 ← all 𝑋𝑖𝑗 from Γ𝐴𝑖 , Γ𝐵𝑗 for 𝜌 in Σ                                      ◁ all sets of causes of affectedness for 𝜌
 +
6:        𝑄 ← queue with 𝜌 enqueued
 +
7:        mdh = max(|𝑋𝑖𝑗 |) with 𝑋𝑖𝑗 in 𝒳𝜌
 +
8:        for dh ← 1 to mdh do                                                                          ◁ build hu-tree 𝑇 for 𝜌
 +
9:            𝜇 = 𝑄.dequeue()                                                                    ◁ next node to compose back
 +
10:            for 𝑋𝑖𝑗 in 𝒳𝜌 do
 +
11:                if dh ≤ |𝑋𝑖𝑗 | then
 +
12:                    𝜎 = 𝑋𝑖𝑗 .next()                    ◁ next cause, from indirect to direct, in order of appearance in 𝑋𝑖𝑗
 +
13:                    𝜈 = unfold(𝜇, 𝜎)                                              ◁ unfold current node with current cause
 +
14:                    𝑄.enqueue(𝜈)
 +
15:                    𝑇𝑖𝑗 ← 𝑇𝑖𝑗 ∪ {𝜈}                                  ◁ update path 𝑇𝑖𝑗 at depth dh with unfolding node 𝜈
 +
16:                    if isRecursive(𝜎, 𝑋𝑖𝑗 ) then
 +
17:                        𝑇 ← 𝑇 ∪ {fold(unfold(𝜈, 𝜎), 𝜇)}                  ◁ update 𝑇 with folding node for recursive cause
 +
18:        Σ′ .add(𝑇.leaves())
 +
19:    skolem-cleanup(Σ′ )                                              ◁ simplify Skolem atoms and deduplicate rules in Σ′
 +
        return Σ′
 +
�            (a) Reasoning times in Spec and All.        (b) Corporations discovered in All.
 +
Figure 2: Results of the experiments for Spec and All tasks, to vary the number of companies.
 +
HJE in Vadalog System. HJE is integrated into the logic optimizer of the Vadalog system, the
 +
component responsible for applying the required rewriting steps to the program before the
 +
reasoner processes it [11]. This enables reasoning termination and decidability over recursive
 +
Warded settings with harmful joins without affecting the expressive power of the Warded
 +
fragment. Thus, we are now able to reason over the scenario in Example 1. We run the
 +
experiments on a local installation of the Vadalog system, with a MacBook Pro i7 with 2.5 GHz
 +
and 16 GB of RAM. The rules listed below are added via HJE to Σ′ (some are merged for space
 +
reasons): 𝜏3 labels the leaf of 𝑇21 and 𝜐3 labels the folded node over 𝛽ΓCEO2 4 from Figure 1, after
 +
skolem cleanup. We adjusted input data from the DBpedia [3] company KG, adopting datasets
 +
of increasing complexity, with 1K, 5K, 10K, 15K, 20K, 25K and 30K companies. We built two
 +
reasoning tasks: (i) Spec, to obtain all the corporations of the company CBS, and (ii) All, to obtain
 +
all the corporations, to vary the number of companies. Figure 2(a) illustrates the reasoning
 +
times, all under 16 seconds, which proves the very good performance of the Vadalog system.
 +
Spec requires more time than All for smaller datasets; when the number of companies increases,
 +
All tends to a steeper curve. Figure 2(b) shows the number of corporations discovered in All.
 +
                                                    Company(x) → Corp(x, x)                      (𝜏1 )
 +
                          Company(x), Merges(x, y) → Corp(x, y), Corp(y, x)                    (𝜏2,3 )
 +
                        Company(x), Merges(x, y), Merges(x, z) → Corp(y, z)                      (𝜏4 )
 +
                          Corp(x, y), Merges(x, w), Merges(x, z) → Corp(w, z)                    (𝜏5 )
 +
                              Corp(x, y), Merges(x, z) → Corp(x, z), Corp(z, x)                (𝜏6,7 )
 +
                                                      Corp(x, y) → Corp(x, x)                    (𝜏8 )
 +
                              Corp(x, y), Merges(x, z) → Corp(y, z), Corp(z, y)                (𝜐1,2 )
 +
                              Corp(x, y), Merges(y, z) → Corp(x, z), Corp(z, x)                (𝜐3,4 )
 +
 +
 +
4. Conclusion
 +
To achieve reasoning termination and decidability over Warded Datalog± in practice, the settings
 +
must not contain harmful joins. In this work, we discussed the disarmament problem of rewriting
 +
such joins into an equivalent Harmless Warded form that upholds the expressiveness of the
 +
fragment and the correctness of the task. We then presented the Harmful Join Elimination, the
 +
disarmament algorithm integrated into the Warded Datalog± -based reasoner Vadalog system.
 +
�References
 +
[1] G. Gottlob, A. Pieris, Beyond SPARQL under OWL 2 QL entailment regime: Rules to the
 +
    rescue, in: IJCAI, 2015.
 +
[2] M. Krötzsch, V. Thost, Ontologies for knowledge graphs: Breaking the rules, in: Interna-
 +
    tional Semantic Web Conference (1), volume 9981 of LNCS, 2016, pp. 376–392.
 +
[3] L. Bellomarini, E. Sallinger, G. Gottlob, The Vadalog System: Datalog-based reasoning for
 +
    knowledge graphs, VLDB 11 (2018).
 +
[4] P. Barceló, R. Pichler, Datalog in Academia and Industry: Second International Workshop,
 +
    Datalog 2.0, Vienna, Austria, September 11-13, Proceedings, volume 7494, Springer, 2012.
 +
[5] A. Calì, G. Gottlob, M. Kifer, Taming the infinite chase: Query answering under expressive
 +
    relational constraints, Journal of Artificial Intelligence Research 48 (2013) 115–174.
 +
[6] A. Calì, G. Gottlob, T. Lukasiewicz, B. Marnette, A. Pieris, Datalog+/-: A family of logical
 +
    knowledge representation and query languages for new applications, in: 2010 25th Annual
 +
    IEEE LICS, IEEE, 2010, pp. 228–242.
 +
[7] A. Calì, G. Gottlob, T. Lukasiewicz, A general datalog-based framework for tractable query
 +
    answering over ontologies, Journal of Web Semantics 14 (2012) 57–83.
 +
[8] G. Gottlob, A. Pieris, E. Sallinger, Vadalog: recent advances and applications, in: JELIA,
 +
    Springer, 2019, pp. 21–37.
 +
[9] L. Bellomarini, G. Gottlob, A. Pieris, E. Sallinger, Swift logic for big data and knowledge
 +
    graphs, in: IJCAI, Springer, 2017, pp. 2–10.
 +
[10] D. Maier, A. O. Mendelzon, Y. Sagiv, Testing implications of data dependencies, ACM
 +
    Transactions on Database Systems 4 (1979) 455–468.
 +
[11] L. Bellomarini, D. Benedetto, G. Gottlob, E. Sallinger, Vadalog: A modern architecture for
 +
    automated reasoning with large knowledge graphs, Information Systems (2020) 101528.
 +
[12] T. Baldazzi, L. Bellomarini, E. Sallinger, P. Atzeni, Eliminating harmful joins in warded
 +
    datalog+/-, in: International Joint Conference on Rules and Reasoning, Springer, 2021, pp.
 +
    267–275.
 +
[13] T. Baldazzi, L. Bellomarini, E. Sallinger, P. Atzeni, iwarded: A system for benchmarking
 +
    datalog+/-reasoning (technical report), arXiv preprint arXiv:2103.08588 (2021).
 +
[14] G. Berger, G. Gottlob, A. Pieris, E. Sallinger, The space-efficient core of Vadalog, in: PODS,
 +
    2019, pp. 270–284.
 +
[15] G. Gottlob, S. Rudolph, M. Simkus, Expressiveness of guarded existential rule languages,
 +
    in: PODS, 2014, pp. 27–38.
 +
[16] F. Afrati, M. Gergatsoulis, F. Toni, Linearisability on datalog programs, Theoretical
 +
    Computer Science 308 (2003) 199–226.
 +
[17] M. Kaminski, Y. Nenov, B. C. Grau, Datalog rewritability of disjunctive datalog programs
 +
    and non-Horn ontologies, Artificial Intelligence 236 (2016) 90–118.
 +
[18] Z. Wang, P. Xiao, K. Wang, Z. Zhuang, H. Wan, Query answering for existential rules via
 +
    efficient datalog rewriting., in: IJCAI, 2020, pp. 1933–1939.
 +
[19] N. Francis, L. Segoufin, C. Sirangelo, Datalog rewritings of regular path queries using
 +
    views, arXiv preprint arXiv:1511.00938 (2015).
 +
[20] S. Ahmetaj, M. Ortiz, M. Simkus, Polynomial datalog rewritings for expressive description
 +
    logics with closed predicates., in: IJCAI, 2016, pp. 878–885.
 +
 +
</pre>

Latest revision as of 17:54, 30 March 2023

Paper

Paper
edit
description  
id  Vol-3194/paper35
wikidataid  Q117344904→Q117344904
title  Reasoning in Warded Datalog+/- with Harmful Joins
pdfUrl  https://ceur-ws.org/Vol-3194/paper35.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/BaldazziBSA22
volume  Vol-3194→Vol-3194
session  →

Reasoning in Warded Datalog+/- with Harmful Joins

load PDF

Reasoning in Warded Datalog+/- with Harmful Joins
Teodoro Baldazzi1 , Luigi Bellomarini2 , Emanuel Sallinger3,4 and Paolo Atzeni1
1
  Università Roma Tre, Department of Computer Science and Engineering, Rome, Italy
2
  Banca d’Italia, Italy
3
  TU Wien, Faculty of Informatics, Vienna, Austria
4
  University of Oxford, Department of Computer Science, Oxford, UK


                                         Abstract
                                         Warded Datalog+/- is a powerful member of the Datalog+/- family of logic languages, featuring full
                                         support for recursion and existential quantification. As a result of the promising trade-off between
                                         expressive power and data complexity offered, it recently rose as a relevant candidate for ontological
                                         reasoning on large knowledge graphs and is employed as logic core of the Vadalog system, a well-known
                                         state-of-the-art reasoner. To achieve decidability and data tractability in practice over recursive settings,
                                         reasoners such as Vadalog adopt specialized strategies that control the effects of recursion and ensure
                                         reasoning termination with small memory footprint. However, exploiting the theoretical underpinnings
                                         of the Warded fragment, to enable these strategies, requires the settings not to contain “harmful” joins,
                                         i.e., on variables affected by existential quantification. To support reasoning decidability and the full
                                         expressive power of the language, we developed the Harmful Join Elimination, an algorithm that removes
                                         such joins while preserving the correctness of the task, and we integrated it into the Vadalog system.

                                         Keywords
                                         Datalog, Vadalog, ontological reasoning, existential quantification, harmful joins




1. Introduction
Recent years’ widespread interest towards querying and exploiting large amounts of data
in the form of knowledge graphs (KGs) has led to the rising development and adoption of
intelligent systems that manage such extensional knowledge and infer new intensional one
via efficient ontological reasoning mechanisms [1]. To achieve this, modern reasoning and
KG navigation applications require to employ powerful formalisms and logic languages for
knowledge representation [2], capable of providing full support for recursion and existential
quantification as well as sustaining decidability and polynomial data complexity of the task [3].
   Datalog± [4, 5, 6, 7] is one of the most commonly adopted families of logic languages
(technically, fragments) for reasoning on KGs [6]. Among them, Warded Datalog± [8] effectively
covers the above requirements. Indeed, it encompasses both a high expressiveness, capturing
plain Datalog as well as SPARQL queries under OWL 2 QL entailment regime and set semantics,
and a simple syntax that enable powerful knowledge-modeling. It also offers a very good trade-
off with computational complexity and captures PTIME for the reasoning [9]. Its semantics is
defined via the chase [10], an algorithmic tool that takes as input a database 𝐷 and a set Σ of
SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
$ teodoro.baldazzi@uniroma3.it (T. Baldazzi); luigi.bellomarini@bancaditalia.it (L. Bellomarini);
sallinger@dbai.tuwien.ac.at (E. Sallinger); paolo.atzeni@uniroma3.it (P. Atzeni)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
�rules, and adds new tuples to 𝐷 until Σ is satisfied: such tuples may contain freshly generated
symbols 𝜈𝑖 (technically, labelled nulls) that act as placeholders for the existentially quantified
variables [6]. The Warded fragment is employed in the Vadalog system [11], a state-of-the-art
reasoner that allows to perform ontological reasoning tasks in complex scenarios.
   While such favourable characteristics of Warded Datalog± bode well for efficient imple-
mentations, the interplay between recursion and existential quantification may lead to the
generation of infinite labelled nulls in the chase, causing the procedure not to terminate and
thus inhibiting decidability of the reasoning task in practice [5]. To properly control such
interactions, reasoners resort to specific techniques known as termination strategies. Specif-
ically, the Vadalog system employs the isomorphism termination strategy, which consists in
suppressing isomorphic copies of previously generated facts (i.e., same name, same constants
in same positions and bijection between labelled nulls), hence the chase steps starting from
them are not performed and the descending facts are not generated. The correctness of such
strategy is corroborated by the reasoning boundedness property of the fragment, which states
that facts derived from isomorphic origins are isomorphic, thus uninformative for the reasoning
task [3]. However, as a necessary condition to exploit this property, the set of Warded rules in
the scenario is required to be in a harmless form, i.e., without joins between variables affected
by existential quantification. Indeed, during the chase rules with such harmful joins could
activate on labelled nulls. Therefore, suppressing isomorphic facts that carry these nulls could
inhibit rule activation and affect reasoning correctness. On the other hand, avoiding the use of
these joins affects the expressive power of the fragment. Consider the following example.
Example 1. Company merger scenario modeled with a set Σ of Warded Datalog± rules.
                                       Company(x) → ∃c CEO(x, c)                               (𝛼)
                                Merges(x, y), CEO(x, c) → CEO(y, c)                            (𝛽)
                               Corp(x, y) → ∃c CEO(x, c), CEO(y, c)                            (𝛾)
                                  CEO(x, c), CEO(y, c) → Corp(x, y)                             (𝜌)

For each company 𝑥 there exists a CEO 𝑐 (rule 𝛼). If 𝑥 merges with a company 𝑦, 𝑐 also becomes
CEO of 𝑦 (rule 𝛽). If 𝑥 and 𝑦 have a common CEO, they are in the same corporation (rule 𝜌) and vice
versa (rule 𝛾). Consider the database instance 𝐷 = {Company(Hsb), Company(Iba), Merges(Hsb,
Iba)} and the query 𝑄: “what are all the entailed Corporations?” as ontological reasoning task.
It can be observed that the set of corporations is finite, whereas the chase does not terminate.
By following the procedure without termination strategy, we first generate CEO(Hsb,𝜈0 ) and
CEO(Iba,𝜈1 ) by activating 𝛼 from the facts Company(Hsb) and Company(Iba), respectively.
Next, we obtain CEO(Iba,𝜈0 ) from 𝛽, Corp(Hsb,Iba) via the join on 𝜈0 in 𝜌, then CEO(Hsb,𝜈2 )
and CEO(Iba,𝜈2 ) by activating 𝛾, 𝛽 and so on. Due to the existential quantification      in rule 𝛾
and its interplay with the recursions in rules 𝛽 and 𝜌, an infinite set 𝑖=3,... {CEO(Hsb, 𝜈𝑖 ),
                                                                           ⋃︀
CEO(Iba, 𝜈𝑖 )} is generated. On the other hand, employing the isomorphism termination strategy
is not feasible either, due to the harmful join in rule 𝜌. Indeed, CEO(Iba,𝜈0 ) is isomorphic with
CEO(Iba,𝜈1 ), yet its pruning would prevent the join in 𝜌 with CEO(Hsb,𝜈0 ) from generating
Corp(Hsb,Iba) and the query in Example 1 from being answered correctly.
   In this discussion paper, we illustrate how we enabled reasoning termination and decidability
in practice over Warded Datalog± programs with harmful joins while preserving the expressive
�power of the fragment. First, we introduce the disarmament problem, which consists in
rewriting such programs into a version without harmful joins (namely, in the newly defined
fragment Harmless Warded Datalog± ) that is equivalent with respect to the chase. Then, we
present Harmful Join Elimination (HJE) [12, 13], the first, to the best of our knowledge,
rewriting technique to solve the disarmament problem. Finally, we discuss HJE integration into
the Vadalog system and its reasoning application over the scenario in Example 1.
Related Work. The importance of achieving reasoning decidability in Datalog± fragments
and handling the interplay between recursion and existential quantification acted as catalysts
for the development of novel approaches for reasoning termination [14, 7, 6]. More in general,
HJE belongs to the class of methodologies for Datalog rewriting. Among them, we mention its
conversion into fragments such as Guarded [15], Linear [16], and Disjunctive [17]. Furthermore,
by interpreting the rules as queries, distinct techniques were proposed to rewrite Datalog±
programs with existentials [18] from Regular Path Queries [19] and Description Logics [20].
Overview. The remainder of this paper is organized as follows. In Section 2, we provide an
overview of Warded Datalog± . In Section 3, we illustrate the HJE algorithm as well as its
application over Example 1 in the Vadalog system. We draw our conclusions in Section 4.


2. Syntax and Semantics of Warded Datalog±
A Warded Datalog± program consists of a set of facts and rules. An existential rule is a first-order
sentence ∀𝑥  ¯ ∀𝑦¯(𝜙(𝑥               ¯ , 𝑧¯)), where 𝜙 (the body) and 𝜓 (the head) are conjunctions of
                       ¯ , 𝑦¯)→∃𝑧¯ 𝜓(𝑥
atoms, over the respective predicates, with constants and variables. For brevity, we may omit
quantifiers and denote conjunction by comma. Let Σ be a set of rules and 𝑝[𝑖] a position (i.e.,
the 𝑖-th term of a predicate 𝑝 with arity 𝑘, where 𝑖 = 1, . . . , 𝑘). We define 𝑝[𝑖] as affected if (i) 𝑝
appears in a rule in Σ with an existentially quantified variable (∃-variable) in the 𝑖-th term or,
(ii) there is a rule in Σ such that a universally quantified variable (∀-variable) is only in affected
body positions and in 𝑝[𝑖] in the head. A ∀-variable 𝑥 is harmful, wrt a rule 𝜌 in Σ, if 𝑥 appears
only in affected positions in 𝜌, otherwise it is harmless. In Example 1, 𝛼 and 𝛾 are existential
rules, while 𝜌 is a harmful join rule, i.e., a rule such that a harmful variable is involved in a join
(namely, harmful join). If the harmful variable is in head(𝜌), it is dangerous [1].
    Chase-based procedures enforce the satisfaction of Σ over a database 𝐷, incrementally
expanding 𝐷 into new instances 𝐼 with facts derived from the application of the rules, until Σ
is satisfied (𝐼 = 𝑐ℎ𝑎𝑠𝑒(𝐷, Σ)). Consider an instance 𝐼 ′ ⊇ 𝐼 and a rule 𝜎 : 𝜙(𝑥        ¯ , 𝑦¯)→∃𝑧¯⟨︀𝜓(𝑥 ¯⟩︀, 𝑧¯)
∈ Σ. In the version of the chase we refer to (known as oblivious [5]), a chase step 𝜎,ℎ is
applicable to 𝐼 ′ if there exists a homomorphism ℎ that maps the atoms of 𝜙(𝑥             ¯ , 𝑦¯) to facts of
𝐼 (i.e., ℎ(𝜙(𝑥¯ , 𝑦¯)) ⊆ 𝐼). When the chase step is applicable, the atom ℎ′ (𝜓(𝑥  ¯ , 𝑧¯)) is added to 𝐼 ′ ,
where ℎ′ is obtained by extending ℎ so that ℎ′ (𝑧𝑖 ) is a fresh labelled null, ∀ 𝑧𝑖 ∈ 𝑧¯. The chase
graph 𝒢(𝐷, Σ) is the directed graph with the facts from chase(𝐷, Σ) as nodes and an edge from
a node 𝑛 to a node 𝑚 if 𝑚 is obtained from 𝑛 (and possibly other facts) via a chase step [7].
    Given a pair 𝑄 = (Σ, Ans), where Ans is an n-ary predicate, the evaluation of 𝑄 over 𝐷
is the set of tuples 𝑄(𝐷, Σ) = {𝑡¯ ∈ dom(𝐷)𝑛 | Ans(𝑡¯) ∈ chase(𝐷, Σ)}, where ¯𝑡 is a tuple of
constants. A reasoning task consists in finding an instance 𝐽 st: (i) ¯𝑡 ∈ 𝐽 iff Ans(𝑡¯) ∈ 𝑄(𝐷, Σ);
and (ii) for every other 𝐽 ′ st ¯𝑡 ∈ 𝐽 ′ iff ¯𝑡 ∈ 𝑄(𝐷, Σ), there is a homomorphism from 𝐽 to 𝐽 ′ [11].
�3. Enabling Reasoning in Warded Datalog±
To achieve reasoning termination and decidability over recursive Warded Datalog± settings,
especially in the presence of existential quantification, the oblivious chase is enriched with the
isomorphism termination strategy, which acts as a firing condition that limits the activation
of applicable chase steps [11]. With reference
                                           ⟨︀ ⟩︀ to the definitions in Section 2, such isomorphic
chase variant causes an applicable step 𝜎,ℎ to activate, wrt 𝐼 ′ ⊇ 𝐼, if there is no isomorphism
of h(head(𝜎)) with 𝐼 ′ . Therefore, given two isomorphic facts 𝑓 and 𝑓 ′ , only 𝑓 is explored in the
isomorphic chase, whereas the descending portions of the chase graph rooted in 𝑓 ′ are pruned
because isomorphic to the ones rooted in 𝑓 , thus uninformative for the reasoning task.
Harmless Warded Datalog± . However, as we have introduced, in order to exploit such rea-
soning boundedness property and sustain decidability of the task while preserving correctness,
Warded programs with recursion and existentials must not contain harmful join rules [3, Theo-
rem 2], that is, they belong to Harmless Warded Datalog± [12]. Without loss of generality (as
more complex joins can be broken into multiple steps [11]), a harmful join rule is a warded rule
𝜌 : 𝐴(𝑥1 , 𝑦1 , ℎ), 𝐵(𝑥2 , 𝑦2 , ℎ) → ∃𝑧 𝐶(𝑥, 𝑧), where 𝐴, 𝐵 and 𝐶 are atoms, 𝐴[3] and 𝐵[3] are
affected positions, 𝑥1 , 𝑥2 ⊆ 𝑥, 𝑦1 , 𝑦2 ⊆ 𝑦 are disjoint tuples of harmless variables or constants
and ℎ is a harmful variable. A set of rules ∈ Harmless Warded Datalog± if: (1) it is warded, i.e.,
all the dangerous variables in its rules appear in a single body atom (the ward), which only shares
harmless variables with the rest of the body; and (2) it does not contain harmful join rules.
Disarming the Warded Fragment. While in presence of programs without harmful joins the
isomorphic chase can be employed without affecting the correctness nor the computational
properties of the reasoning, restricting the Warded fragment to its Harmless Warded counterpart
affects the expressive power available to reasoners such as the Vadalog system implementing
it. With the goal of preventing such syntactic limitations, we developed a technique to solve
the disarmament problem, that is, to rewrite a set Σ of Warded Datalog± rules in input to the
reasoners into an equivalent set Σ′ of Harmless Warded rules. In this context, two sets of rules
are considered equivalent if they have the same meaning with respect to the chase [16], i.e.,
chase(𝐷, Σ) = chase(𝐷, Σ′ ) modulo fact isomorphism for each database 𝐷. Such technique,
which we named Harmful Join Elimination (HJE), also allowed us to operationally prove that
for each Warded set there exists an equivalent Harmless Warded one [13].
    Intuitively, the HJE algorithm replaces each harmful join rule 𝜌 with a set of harmless rules that
cover the generation of all the facts derived from activating 𝜌, thus preserving correctness. To
achieve this, it operates by considering the rules involved in the propagation of the affectedness
(i.e., the labelled nulls in the chase) from the existentials to the harmful join variables in 𝜌.

Definition 1 (Causes of Affectedness). Let 𝜌 ∈ Σ be a harmful join rule and let 𝐻 ∈ {A,B} be an
atom in the body of 𝜌. We define causes of affectedness as the sequences of rules Γ𝐻𝑖 = [𝜎𝑠 , . . . , 𝜎1 ]
(𝑠 < |Σ|, 𝑖 ≥ 1) ∈ Σ, where: (i) 𝜎1 : 𝐻1 (𝑥, 𝑦1 ), 𝑅1 →∃ℎ 𝐻2 (𝑥, 𝑦2 , ℎ) is a direct cause, i.e., an exis-
tential rule that causes a position to be affected; and (ii) 𝜎𝑘 : 𝐻𝑘 (𝑥, 𝑦𝑘 , ℎ), 𝑅𝑘 →𝐻𝑘+1 (𝑥, 𝑦𝑘+1 , ℎ),
1 < 𝑘 ≤ 𝑠, are indirect causes, i.e., rules that propagate the affectedness from 𝜎1 to 𝜌, such that
𝐻𝑠+1 = 𝐻. 𝐻1 , . . . , 𝐻𝑠 are atoms, 𝑅1 , . . . , 𝑅𝑠 are atoms or conjunctions of atoms not containing
ℎ (as the rules are Warded). Let 𝑋𝑖𝑗 = Γ𝐴𝑖 ⌢ Γ𝐵𝑗 be the concatenation of the sequences Γ𝐴𝑖 and
Γ𝐵𝑗 with the same direct cause: the causes in 𝑋𝑖𝑗 are labelled after the sequence they belong to.
�With reference to Example 1, the rules 𝛼 and 𝛾 are direct causes of affectedness, whereas 𝛽 is
an indirect one. The sequences of causes for the atom CEO𝑘 (k ∈ {1,2}, in order of appearance
in 𝜌) are: ΓCEO𝑘 1 = [𝛼], ΓCEO𝑘 2 = [𝛽, 𝛼], ΓCEO𝑘 3 = [𝛾], ΓCEO𝑘 4 = [𝛽, 𝛾]. For instance, 𝑋21 =
[𝛽ΓCEO1 2 , 𝛼ΓCEO1 2 , 𝛼ΓCEO2 1 ] is a concatenation, as ΓCEO1 2 ,ΓCEO2 1 contain the same direct cause 𝛼.
   To determine how the propagated nulls activating the harmful join in 𝜌 affect the meaning
in the chase, and to consequently build proper harmless rules, we compose 𝜌 along all its
concatenations 𝑋𝑖𝑗 of causes of affectedness via the unfolding and folding operations [16].

Definition 2 (Unfolding and Folding). Let 𝜌 be a rule 𝐴, 𝐵→𝐶, where 𝐴 and 𝐶 are atoms and
𝐵 is an atom or a conjunction of atoms, and let 𝜎 be a rule 𝑅→𝐴′ , where 𝐴′ is an atom and 𝑅
an atom or a conjunction of atoms. Let 𝐴′ be unifiable with 𝐴 by substitution 𝜃. The result of
unfolding 𝜌 at 𝐴 with 𝜎 is the rule 𝜏 : (𝐵, 𝑅→𝐶)𝜃. If the head of 𝜎 contains an ∃-variable ℎ, it
replaces ℎ with a Skolem atom 𝑓ℎ𝜎 in 𝜏 , where 𝑓 is an injective, deterministic and range disjoint
function that calculates the values for ∃-variables, to control the identity of labelled nulls. Now, let
𝜎 ′ be a rule 𝐵 ′ →𝑅, where 𝑅 is an atom and 𝐵 ′ is an atom or a conjunction of atoms. Let 𝐵 ′ be
unifiable with 𝐵 by substitution 𝜃′ . The result of folding 𝜌 into 𝜎 ′ is the rule 𝜐: (𝐴, 𝑅→𝐶)𝜃′ .

The results of the compositions for each harmful join rule are represented via the       ⟨︀ harmful
unfolding tree (hu-tree). Apart from the more technical side [12], the hu-tree 𝑇 for Σ, 𝜌 can
                                                                                                ⟩︀

be defined as a rule-labelled tree-like structure where: (i) the root is labelled by 𝜌; (ii) for each
𝑋𝑖𝑗 , there exists a root-to-leaf path 𝑇𝑖𝑗 whose nodes are labelled by the result of unfolding their
parent nodes with the causes in 𝑋𝑖𝑗 (in order of appearance); (iii) for each 𝜎𝑘 ∈ 𝑋𝑖𝑗 involved in
a recursion, there exists a leaf labelled by result of unfolding its parent node 𝜇 with 𝜎𝑘 and then
folding it into 𝜇. By definition of unfolding and folding, the leaves in 𝑇 are harmless rules that
cover the generation of all the facts derived in the chase from activating 𝜌 on labelled nulls [13].
                                                     CEO(x,c),CEO(y,c)⟶Corp(x,y)
                                                 α                         β                         γ

                                            α            β             α           γ             β            γ

                                       τ1                                                                         τ8
                                                     α        α            β   β            γ        γ

                                                τ2       τ3                                     τ6       τ7
                                                                           α   γ
             βГCEO 2                                              τ4                   τ5                       Corp(v1,v2),Merges(v3,y),
                  1
                                                                                                              CEO(v3,c),fcγ(v1)⟶Corp(v1,y)
          Merges(v1,x),CEO(v1,c),
            CEO(y,c)⟶Corp(x,y)                                                                                                βГCEO 4
                                                                                                                                   2
             αГCEO 2
                  1
                                                                                                         Corp(v1,v2),Merges(v3,y),Merges(v4,v3),
    Merges(v1,v2),Company(v1),CEO(y,c),                                                                       CEO(v4,c),fcγ(v1)⟶Corp(v1,y)
            fcα(v1)⟶Corp(v2,y)
             αГCEO 1
                  2
                                                                                                         Corp(v1,v2),Merges(v2,v3)⟶Corp(v1,v3)
   Merges(v1,v2),Company(v1),Company(v3),
         fcα(v1),fcα(v3)⟶Corp(v2,v3)


                       ⟨︀ ⟩︀
Figure 1: Hu-tree 𝑇 for Σ,𝜌 of Example 1, with unfolding (blue) and folding (green) compositions.
�Figure 1 shows the hu-tree 𝑇 of Σ,𝜌 in Example 1. Specifically, the figure in the middle depicts
                                  ⟨︀ ⟩︀

the sequences of unfoldings (blue arrows) and foldings (green arrows) with the causes (here
not labelled), from 𝜌 to the leaves with the resulting harmless rules; the left figure illustrates
more in detail the root-to-leaf path 𝑇21 obtained by unfolding the causes in 𝑋21 ; the right figure
provides an application of folding over the recursive cause 𝛽ΓCEO2 4 . Note that 𝑇 does not contain
cycles, that is, only the plain arrows (not the dotted ones) in the figure are edges of the hu-tree.
HJE Algorithm. Given a Warded Datalog± set Σ with harmful join rules 𝜌, the HJE algorithm
produces a new equivalent set Σ′ = HJE(Σ). It can be divided into two main phases. We refer
to the extended versions of the work [12, 13] for an in-depth discussion of the algorithm.
   In the back-composition phase (Algorithm 1), HJE first derives for each 𝜌 all the causes of
affectedness and the concatenations 𝑋𝑖𝑗 (line 5). Then, it iteratively builds the hu-tree 𝑇 via
unfolding (line 13) and, in case of recursive causes, folding (line 17) along the causes in each
𝑋𝑖𝑗 , in order of appearance, until all the root-to-leaf paths have been created: it employs the
maximum distance from harmlessness (mdh), a value that corresponds to the highest cardinality
among the 𝑋𝑖𝑗 , i.e., the height of 𝑇 . The rules labelling the resulting leaves are then subject to
an overall cleanup and deduplication (line 19). Rules that never activate are dropped. Also, if the
functions of the Skolem atoms in a rule (derived from unfolding a direct cause in 𝑇 ) respect
injectivity and range disjointness, they are simplified, otherwise the rule is dropped.
   The grounding phase occurs if Σ contains rules 𝜋 that are not causes of affectedness and that
propagate ground values from the database to the harmful join in 𝜌. In this case, to cover the
activation of 𝜌 on such values, the Dom(ℎ) [3] rules are employed (below for A, corresponding
ones for B), which force the harmful join variables to bind only to constants in the domain.

                                               Dom(h), A(x, y, h) → A′ (x, y, h)                                            (𝛿1 )
                                                           A (x, y, h) → A(x, y, h)
                                                             ′
                                                                                                                            (𝛿2 )
                                        A (x1 , y1 , h), B(x2 , y2 , h) → ∃z C(𝑥, z)
                                          ′
                                                                                                                            (𝛿3 )

Algorithm 1 Back-Composition in Harmful Join Elimination.
 1: function back-composition(Σ)
 2:    P ← all harmful join rules in Σ; Σ′ = Σ
 3:    for 𝜌 in P do
 4:        𝑇 ← empty hu-tree with 𝜌 as root
 5:        𝒳𝜌 ← all 𝑋𝑖𝑗 from Γ𝐴𝑖 , Γ𝐵𝑗 for 𝜌 in Σ                                      ◁ all sets of causes of affectedness for 𝜌
 6:        𝑄 ← queue with 𝜌 enqueued
 7:        mdh = max(|𝑋𝑖𝑗 |) with 𝑋𝑖𝑗 in 𝒳𝜌
 8:        for dh ← 1 to mdh do                                                                          ◁ build hu-tree 𝑇 for 𝜌
 9:            𝜇 = 𝑄.dequeue()                                                                     ◁ next node to compose back
10:            for 𝑋𝑖𝑗 in 𝒳𝜌 do
11:                if dh ≤ |𝑋𝑖𝑗 | then
12:                    𝜎 = 𝑋𝑖𝑗 .next()                    ◁ next cause, from indirect to direct, in order of appearance in 𝑋𝑖𝑗
13:                    𝜈 = unfold(𝜇, 𝜎)                                              ◁ unfold current node with current cause
14:                    𝑄.enqueue(𝜈)
15:                    𝑇𝑖𝑗 ← 𝑇𝑖𝑗 ∪ {𝜈}                                  ◁ update path 𝑇𝑖𝑗 at depth dh with unfolding node 𝜈
16:                    if isRecursive(𝜎, 𝑋𝑖𝑗 ) then
17:                         𝑇 ← 𝑇 ∪ {fold(unfold(𝜈, 𝜎), 𝜇)}                  ◁ update 𝑇 with folding node for recursive cause
18:        Σ′ .add(𝑇.leaves())
19:    skolem-cleanup(Σ′ )                                               ◁ simplify Skolem atoms and deduplicate rules in Σ′
         return Σ′
�            (a) Reasoning times in Spec and All.         (b) Corporations discovered in All.
Figure 2: Results of the experiments for Spec and All tasks, to vary the number of companies.
HJE in Vadalog System. HJE is integrated into the logic optimizer of the Vadalog system, the
component responsible for applying the required rewriting steps to the program before the
reasoner processes it [11]. This enables reasoning termination and decidability over recursive
Warded settings with harmful joins without affecting the expressive power of the Warded
fragment. Thus, we are now able to reason over the scenario in Example 1. We run the
experiments on a local installation of the Vadalog system, with a MacBook Pro i7 with 2.5 GHz
and 16 GB of RAM. The rules listed below are added via HJE to Σ′ (some are merged for space
reasons): 𝜏3 labels the leaf of 𝑇21 and 𝜐3 labels the folded node over 𝛽ΓCEO2 4 from Figure 1, after
skolem cleanup. We adjusted input data from the DBpedia [3] company KG, adopting datasets
of increasing complexity, with 1K, 5K, 10K, 15K, 20K, 25K and 30K companies. We built two
reasoning tasks: (i) Spec, to obtain all the corporations of the company CBS, and (ii) All, to obtain
all the corporations, to vary the number of companies. Figure 2(a) illustrates the reasoning
times, all under 16 seconds, which proves the very good performance of the Vadalog system.
Spec requires more time than All for smaller datasets; when the number of companies increases,
All tends to a steeper curve. Figure 2(b) shows the number of corporations discovered in All.
                                                    Company(x) → Corp(x, x)                       (𝜏1 )
                           Company(x), Merges(x, y) → Corp(x, y), Corp(y, x)                    (𝜏2,3 )
                         Company(x), Merges(x, y), Merges(x, z) → Corp(y, z)                      (𝜏4 )
                          Corp(x, y), Merges(x, w), Merges(x, z) → Corp(w, z)                     (𝜏5 )
                              Corp(x, y), Merges(x, z) → Corp(x, z), Corp(z, x)                 (𝜏6,7 )
                                                      Corp(x, y) → Corp(x, x)                     (𝜏8 )
                              Corp(x, y), Merges(x, z) → Corp(y, z), Corp(z, y)                 (𝜐1,2 )
                              Corp(x, y), Merges(y, z) → Corp(x, z), Corp(z, x)                 (𝜐3,4 )


4. Conclusion
To achieve reasoning termination and decidability over Warded Datalog± in practice, the settings
must not contain harmful joins. In this work, we discussed the disarmament problem of rewriting
such joins into an equivalent Harmless Warded form that upholds the expressiveness of the
fragment and the correctness of the task. We then presented the Harmful Join Elimination, the
disarmament algorithm integrated into the Warded Datalog± -based reasoner Vadalog system.
�References
 [1] G. Gottlob, A. Pieris, Beyond SPARQL under OWL 2 QL entailment regime: Rules to the
     rescue, in: IJCAI, 2015.
 [2] M. Krötzsch, V. Thost, Ontologies for knowledge graphs: Breaking the rules, in: Interna-
     tional Semantic Web Conference (1), volume 9981 of LNCS, 2016, pp. 376–392.
 [3] L. Bellomarini, E. Sallinger, G. Gottlob, The Vadalog System: Datalog-based reasoning for
     knowledge graphs, VLDB 11 (2018).
 [4] P. Barceló, R. Pichler, Datalog in Academia and Industry: Second International Workshop,
     Datalog 2.0, Vienna, Austria, September 11-13, Proceedings, volume 7494, Springer, 2012.
 [5] A. Calì, G. Gottlob, M. Kifer, Taming the infinite chase: Query answering under expressive
     relational constraints, Journal of Artificial Intelligence Research 48 (2013) 115–174.
 [6] A. Calì, G. Gottlob, T. Lukasiewicz, B. Marnette, A. Pieris, Datalog+/-: A family of logical
     knowledge representation and query languages for new applications, in: 2010 25th Annual
     IEEE LICS, IEEE, 2010, pp. 228–242.
 [7] A. Calì, G. Gottlob, T. Lukasiewicz, A general datalog-based framework for tractable query
     answering over ontologies, Journal of Web Semantics 14 (2012) 57–83.
 [8] G. Gottlob, A. Pieris, E. Sallinger, Vadalog: recent advances and applications, in: JELIA,
     Springer, 2019, pp. 21–37.
 [9] L. Bellomarini, G. Gottlob, A. Pieris, E. Sallinger, Swift logic for big data and knowledge
     graphs, in: IJCAI, Springer, 2017, pp. 2–10.
[10] D. Maier, A. O. Mendelzon, Y. Sagiv, Testing implications of data dependencies, ACM
     Transactions on Database Systems 4 (1979) 455–468.
[11] L. Bellomarini, D. Benedetto, G. Gottlob, E. Sallinger, Vadalog: A modern architecture for
     automated reasoning with large knowledge graphs, Information Systems (2020) 101528.
[12] T. Baldazzi, L. Bellomarini, E. Sallinger, P. Atzeni, Eliminating harmful joins in warded
     datalog+/-, in: International Joint Conference on Rules and Reasoning, Springer, 2021, pp.
     267–275.
[13] T. Baldazzi, L. Bellomarini, E. Sallinger, P. Atzeni, iwarded: A system for benchmarking
     datalog+/-reasoning (technical report), arXiv preprint arXiv:2103.08588 (2021).
[14] G. Berger, G. Gottlob, A. Pieris, E. Sallinger, The space-efficient core of Vadalog, in: PODS,
     2019, pp. 270–284.
[15] G. Gottlob, S. Rudolph, M. Simkus, Expressiveness of guarded existential rule languages,
     in: PODS, 2014, pp. 27–38.
[16] F. Afrati, M. Gergatsoulis, F. Toni, Linearisability on datalog programs, Theoretical
     Computer Science 308 (2003) 199–226.
[17] M. Kaminski, Y. Nenov, B. C. Grau, Datalog rewritability of disjunctive datalog programs
     and non-Horn ontologies, Artificial Intelligence 236 (2016) 90–118.
[18] Z. Wang, P. Xiao, K. Wang, Z. Zhuang, H. Wan, Query answering for existential rules via
     efficient datalog rewriting., in: IJCAI, 2020, pp. 1933–1939.
[19] N. Francis, L. Segoufin, C. Sirangelo, Datalog rewritings of regular path queries using
     views, arXiv preprint arXiv:1511.00938 (2015).
[20] S. Ahmetaj, M. Ortiz, M. Simkus, Polynomial datalog rewritings for expressive description
     logics with closed predicates., in: IJCAI, 2016, pp. 878–885.
�