Vol-3194/paper55

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search

Paper

Paper
edit
description  
id  Vol-3194/paper55
wikidataid  Q117344930→Q117344930
title  Helping Wine Lovers With Taxonomies
pdfUrl  https://ceur-ws.org/Vol-3194/paper55.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/CiacciaMT22
volume  Vol-3194→Vol-3194
session  →

Helping Wine Lovers With Taxonomies

load PDF

Helping Wine Lovers With Taxonomies
Paolo Ciaccia1 , Davide Martinenghi2 and Riccardo Torlone3
1
  Dipartimento di Informatica - Scienza e Ingegneria, Università di Bologna, Italy
2
  Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Italy
3
  Dipartimento di Ingegneria, Università Roma Tre, Italy


                                         Abstract
                                         We formally investigate the problem of retrieving the best results complying with multiple preferences
                                         expressed in a logic-based language when data are stored in relational tables with taxonomic domains. We
                                         introduce two operators that rewrite preferences for enforcing transitivity, which guarantees soundness
                                         of the result, and specificity, which solves conflicts among preferences. We show that these two properties
                                         cannot be achieved together and identify the only two possibilities to ensure transitivity and minimize
                                         conflicts. Our approach proves effective when tested over both synthetic and real-world datasets.




1. Introduction
Preferences are typically expressed in generic terms (e.g., I prefer pasta to beef), whereas
available data is more specific (the menu might contain lasagne and hamburger). This mismatch
causes difficulties when trying to automatically suggest the best solutions. The problem becomes
even more involved when several preferences at different levels of granularity and possibly
conflicting with each other are specified.
Example 1. We would like to select some bottles of wine from the list in Figure 1. While we
prefer white wines to red ones, we prefer Amarone (a red wine) to white wine; we prefer Tuscan
wineries in the province of Siena to those in the Piedmont province of Asti. Moreover, if the
winery lies in the Langhe (which spans both the Asti and Cuneo provinces) we prefer an aged
wine (i.e., produced before 2017) to a more recent one.
   In order to support the mentioned preferences, we need further information, as, e.g., provided
by the taxonomies in Figure 1. The example also shows that conflicts can occur when preferences
are defined at different levels of detail (e.g., the preference for Amarone, which is a red wine,
is in contrast with the more generic preference for white wines). Also observe that further
preferences can be naturally (transitively) derived from those that are stated explicitly (e.g.,
from the preference for wines from Siena to those from Asti and the preference for aged wines
when they are from the Langhe region, we can also derive a preference for wines from Siena to
young wines from Langhe).
   In this paper, unlike previous approaches that have only tackled the problem of mapping
preferences to data (see, e.g., [1]), we formally investigate the problem of modifying input
SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
$ paolo.ciaccia@unibo.it (P. Ciaccia); davide.martinenghi@polimi.it (D. Martinenghi);
riccardo.torlone@uniroma3.it (R. Torlone)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
�                                                           Wines
                                              Wine           Winery      Year
                                              Arneis        Correggia    2019    𝑎
                                             Amarone          Masi       2014    𝑏
                                             Amarone         Bertani     2013    𝑐
                                             Canaiolo      Montenidoli   2015    𝑑
                                              Barolo         Laficaia    2014    𝑒
                                              Arneis         Ceretto     2019    𝑓

          Veneto       Tuscany                  Piedmont



          Verona        Siena       Asti     Roero   Cuneo    Langhe
                                                                            white      rosé          red
        Valpolicella


        Masi Bertani Montenidoli   Casorzo   Correggia   Laficaia   Ceretto Arneis   Canaiolo   Amarone Barolo



Figure 1: Taxonomies for the running example.


preferences so as to guarantee that they are transitive and specific. The latter property means
that, in case of conflicts, the more specific preference overrides the more generic one. For
instance the specific preference for Amarone over white wines counts more than the that for
white wines over red ones.
   We therefore study, from both a theoretical and a practical point of view, how transitivity
and specificity can be obtained by suitable rewritings of the initial preferences. We tackle
the problem by introducing two operators that rewrite preferences expressed as FO formulas:
T to enforce transitivity and S to remove conflicts between preferences. We prove that it is
unfortunately impossible to guarantee at the same time transitivity and a complete absence of
conflicts, no matter the order in which T and S are considered and how many times they are
applied. Intuitively, the removal of conflicts may compromise transitivity, whereas enforcement
of transitivity may (re-)introduce conflicts. In spite of this, we formally show that: (i) the set of
all possible sequences of operators can be reduced to a finite (and small) set, (ii) there are only
two sequences that guarantee transitivity and minimize residual conflicts between preferences,
and (iii) the best results obtainable via these two sequences may be very different.
   A number of experiments shows that the computation of the best results largely benefits
from the minimization of conflicts between preferences, while incurring a low overhead due to
the rewriting process.
   The full version of this paper has appeared in [2].


2. Operations on Preferences
We consider a simple extension of the relational model in which the values of an attribute can
be arranged in a hierarchical taxonomy, i.e., a poset 𝑇 = (𝑉, ≤𝑉 ), where 𝑉 is a set of values
and ≤𝑉 is a partial order on 𝑉 .
  Given a set of attribute-taxonomy pairs 𝐴1 : 𝑇1 , . . . , 𝐴𝑑 : 𝑇𝑑 , let 𝒯 denote the set of all
possible tuples over any schema that can be defined using such pairs. A preference relation is
�a relation ⪰ on 𝒯 × 𝒯 . Given 𝑡1 and 𝑡2 in 𝒯 , if 𝑡1 ⪰ 𝑡2 then 𝑡1 is (weakly) preferable to 𝑡2 . If
𝑡1 ⪰ 𝑡2 but 𝑡2 ̸⪰ 𝑡1 , then 𝑡1 is strictly preferable to 𝑡2 , denoted by 𝑡1 ≻ 𝑡2 .
   We consider that preferences are expressed via an intrinsic preference formula [3] 𝐹 such that
𝑡1 ⪰ 𝑡2 ⇔ 𝐹 (𝑡1 , 𝑡2 ). A formula is a disjunction of DNFs, each of which is called a preference
statement, formed by one or more clauses.

Example 2. The preferences in Example 1 can be expressed as 𝑃1 (𝑥, 𝑦) ∨ 𝑃2 (𝑥, 𝑦) ∨ 𝑃3 (𝑥, 𝑦) ∨
𝑃4 (𝑥, 𝑦), which can be compactly written as: 𝑃1 = white ⪰ red (short form of 𝑃1 (𝑥, 𝑦) =
(𝑥[Wine] ≤ white) ∧ (𝑦[Wine] ≤ red)), 𝑃2 = Amarone ⪰ white, 𝑃3 = Siena ⪰ Asti, 𝑃4 =
Langhe ∧ aged ⪰ Langhe ∧ young.

   Now we introduce two operators that can be applied to a preference relation: Transitive
closure (T) and Specificity-based refinement (S). Let ⪰ denote the initial preference relation; the
resulting relation is indicated ⪰T for T and ⪰S for S. Multiple application of operators, e.g., first
T and then S, leads to the relation (⪰T )S , which we compactly denote as ⪰TS . In general, for any
sequence 𝑋 ∈ {T, S}* , ⪰X is the preference relation obtained from the initial preference relation
⪰ by applying the operators in the order in which they appear in X. Notice that ⪰𝜀 = ⪰, where
𝜀 denotes the empty sequence.
   We describe the behavior of the two operators by means of suitable rewritings of a preference
formula. Given a sequence X of operators, and an initial (input) formula 𝐹 (𝑥, 𝑦) inducing the
preference relation ⪰, 𝐹 X (𝑥, 𝑦) denotes the rewriting of 𝐹 that accounts for the application of
the X sequence, yielding ⪰X .
Transitive Closure. Transitivity of ⪰, and consequently of ≻, is a basic requirement of any
sound preference-based system. If ⪰ is not transitive then ≻ might contain cycles, a fact that
could easily lead either to empty or non-stable results [2].
   The transitive closure operator, denoted T, given an input preference relation ⪰X yields the
preference relation ⪰XT . The transitive closure 𝐹 XT of an ipf 𝐹 X with 𝑛 statements 𝑃1 , . . . , 𝑃𝑛
is still a finite ipf that can be computed as described in [3]. In particular, two predicates of
the form (𝑥[𝐴𝑖 ] ≤𝑉𝑖 𝑣1 ) and (𝑥[𝐴𝑖 ] ≤𝑉𝑖 𝑣2 ), over the same attribute 𝐴𝑖 and using the same
variable 𝑥, are contradictory if values 𝑣1 and 𝑣2 are different and have no common descendant
in the taxonomy 𝑉𝑖 . If the predicates are of the form (𝑥[𝐴𝑖 ] ≤𝑉𝑖 𝑣1 ) and (𝑥[𝐴𝑖 ] ̸≤𝑉𝑖 𝑣2 ), then
they are contradictory in case there is a path from 𝑣1 to 𝑣2 in 𝑉𝑖 (or 𝑣1 = 𝑣2 ).
   The fact that the transitive closure is computed with respect to the (possibly infinite) domain
𝒯 of the tuples, and not with respect to a (finite) relation 𝑟 of tuples, is quite standard for
preference relations (see e.g., [3]), and has the advantage of yielding a relation ⪰XT that does
not change with 𝑟.

Example 3. Continuing with Example 2, the transitive closure of 𝐹 is the formula 𝐹 T that,
among others, adds the following statements to 𝐹 :

                   𝑃5 = Amarone ⪰T red          𝑃6 = Siena ⪰T Langhe ∧ young

Statement 𝑃5 clearly follows from 𝑃2 and 𝑃1 , whereas 𝑃6 is obtained from 𝑃3 and 𝑃4 , since
there exists at least one winery that is both in the Asti province and in the Langhe region (Casorzo
is one of them).
�   After applying the T operator, we simplify the formula as needed, and, in particular, we
remove statements that are subsumed by other statements. Similarly, we also simplify statements
by removing contradictory clauses and clauses subsumed within the same statement.
Specificity-based Refinement. The most intriguing of our operators is S. As apparent from
Example 2, conflicting preferences, such as 𝑎 ⪰ 𝑏 and 𝑏 ⪰ 𝑎 (induced by 𝑃1 and 𝑃2 , resp.),
may hold. To solve this problem we resort to a specificity principle, borrowed from non-
monotonic reasoning, stating that more specific information should prevail over more generic
one. Therefore, giving the same importance to, e.g., 𝑃1 and 𝑃2 , the former being more generic,
contradicts the intuition.
   The specificity principle we adopt for analyzing conflicting preferences is based on the
extension of preferences statements, i.e., on the set of pairs of tuples in 𝒯 for which a statement
is true.
Definition 1 (Specificity principle). Let ⪰X be a preference relation, and let 𝐹 X be the correspond-
ing formula. Let 𝑃𝑖 and 𝑃𝑗 be two preference statements in 𝐹 X . We say that 𝑃𝑖 is more specific
than 𝑃𝑗 if, for any pair of tuples 𝑡1 , 𝑡2 ∈ 𝒯 such that 𝑃𝑖 (𝑡1 , 𝑡2 ) is true, then 𝑃𝑗 (𝑡2 , 𝑡1 ) is also true,
and the opposite does not hold.
  From Definition 1 we can immediately determine how a less specific statement has to be
rewritten so as to solve conflicts.
Lemma 1. A preference statement 𝑃𝑖 (𝑥, 𝑦) is more specific than 𝑃𝑗 (𝑦, 𝑥) iff 𝑃𝑖 (𝑥, 𝑦) implies
𝑃𝑗 (𝑦, 𝑥) (written 𝑃𝑖 (𝑥, 𝑦) → 𝑃𝑗 (𝑦, 𝑥)) and the opposite does not hold.
   According to Lemma 1, when 𝑃𝑖 (𝑥, 𝑦) implies 𝑃𝑗 (𝑦, 𝑥) the S operator replaces 𝑃𝑗 (𝑦, 𝑥) with
𝑃𝑗 (𝑦, 𝑥) = 𝑃𝑗 (𝑦, 𝑥) ∧ ¬𝑃𝑖 (𝑥, 𝑦), so that 𝑃𝑖 and 𝑃𝑗′ do not induce any conflicting preferences.
 ′


Example 4. Continuing with Example 3, the application of the S operator amounts to rewriting
formula 𝐹 T by replacing the clause 𝑃1 (𝑥, 𝑦) with 𝑃1 (𝑥, 𝑦) ∧ ¬𝑃2 (𝑦, 𝑥), since 𝑃2 (𝑦, 𝑥) →
𝑃1 (𝑥, 𝑦). This, after distributing ¬ over the two predicates in 𝑃2 and simplifying, leads to the
new clause: 𝑃7 = white ⪰TS red ∧ ¬Amarone.


3. Sequences of Operators
In this section we analyze the effect of performing the operations described in the previous
section, and prove some fundamental properties.
   In order to clarify the relationships between the results of the different operations, we
introduce the notions of equivalence and containment between sequences of operators.
Definition 2 (Equivalence and containment). Let X, Y ∈ {T, S}* ; X is contained in Y, denoted
X ⊑ Y, if for every initial preference relation ⪰, ⪰X ⊆⪰Y ; X and Y are equivalent, denoted X ≡ Y,
if both X ⊑ Y and Y ⊑ X.
   We observe that T and S are idempotent, T is monotone and cannot remove preferences,
while S cannot add preferences. In addition, the preference relation obtained after applying T
on the initial preference relation ⪰ is maximal, in that it includes all other relations obtained
from ⪰ by applying T and S in any way.
�Theorem 1. Let X, Y ∈ {T, S}* , with X ⊑ Y. Then:

              XTT ≡ XT                    XSS ≡ XS             idempotence                     (1)
               XT ⊑ YT                                         monotonicity                    (2)
                 X ⊑ XT                    XS ⊑ X              inflation / deflation           (3)
                 X⊑T                                           maximality                      (4)

  We call complete those sequences that include both T and S. A sequence X is transitive if, for
every initial preference relation ⪰, ≻X is transitive. Eventually, our goal is to drop conflicting
and less specific preferences while preserving transitivity. To this end, we add minimality with
respect to ⊑ as a desideratum.
  The following, non-trivial result shows that we can focus on just eight sequences, shown in
Figure 2, because any sequence is equivalent to one of them.

Theorem 2. Let X ∈ {T, S}* . Then ∃Y ∈ {𝜀, T, S, TS, ST, TST, STS, STST} such that X ≡ Y.




                                                    T




                                      ε         ST       TST




                                      S        STST       TS




                                                STS




Figure 2: A transitively reduced graph showing containment between sequences. Dashed border for
incomplete sequences; grey background for non-transitive sequences; blue background for minimal-
transitive sequences.


   To give an intuition, we observe that i) consecutive repetitions of the same operator are idle
(via idempotence) and ii) the repeated application of a TS suffix does not change the semantics
of a sequence (i.e., XTS ≡ XTSTS).
Minimality and transitivity. Generally, any complete sequence not ending with S is non-
minimal, in that it may contain conflicting preferences (possibly introduced by T) that turn out
to be in contrast with other, more specific preferences. We exemplify this on ST using a single
taxonomy about time.
�Example 5. Let 𝐹 consist of 𝑃1 = autumn ⪰ sep and the more specific 𝑃2 = sep10 ⪰ oct10.
By specificity, in 𝐹 S , 𝑃1 is replaced by the statement 𝑃3 consisting of two clauses: autumn ⪰
sep ∧ ¬sep10 and autumn ∧ ¬oct10 ⪰ sep. In 𝐹 ST , the clauses in 𝑃3 transitively combine into
𝑃1 again, since, e.g., the value sep30 is below sep but not sep10 and below autumn but not
oct10; therefore oct10 ⪰ST sep10 holds. However, in 𝐹 STS , 𝑃1 is again replaced by 𝑃3 , so that
oct10 ̸⪰STS sep10, which shows that ST is not minimal.

   All the containments indicated in Figure 2 are strict, as can be shown through constructions
similar to that of Example 5, so no sequence ending with T is minimal in {T, S}* .
   Transitivity is achieved for any sequence ending with T, while no sequence ending with S is
transitive. Therefore, we summarize our finding in the following major result.

Theorem 3. Let X ∈ {T, S}* . Then XT is not minimal and XS is not transitive, thus no sequence
is both transitive and minimal.

   Since transitivity and minimality cannot be both achieved at the same time, we enforce
transitivity (which cannot be waived) and look for the minimal sequences among those that are
transitive, which we call minimal-transitive sequences.
   We first observe that all complete sequences starting with S are incomparable with those
starting with T (also refer to Figure 2). Therefore, only three sequences are both complete
and transitive: ST, TST and STST, the first of which contains the last one and is therefore not
minimal. The remaining two sequences are transitive, incomparable, and, therefore, minimal in
the set of complete and transitive sequences.

Theorem 4. The only minimal-transitive sequences are TST and STST.

   The result of Theorem 3 is inherent and no finer granularity in the interleaving of T and S (e.g.,
by making S resolve one conflict at a time instead of all together) would remove this limitation.
Furthermore, this limitation is unavoidable and we can prove that no method whatsoever (not
just those based on the T and S operators) could solve it.


4. Obtaining the Best Results
Given a relation 𝑟 ⊆ 𝒯 , the “best” tuples in 𝑟 according to the preference relation ⪰ can be
selected by means of the Best operator 𝛽 [4], which returns the tuples 𝑡1 of 𝑟 such that there is
no other tuple 𝑡2 in 𝑟 that is strictly preferable to 𝑡1 , i.e., 𝛽≻ (𝑟) = {𝑡1 ∈ 𝑟 | ∄𝑡2 ∈ 𝑟, 𝑡2 ≻ 𝑡1 }.
   As shown in Section 3, TST and STST are incomparable, thus there will be relations 𝑟 and
input preference relations ⪰ for which the best results delivered by the two semantics will differ.
In order to quantify the difference between these results, we define, for any two sequences X
and Y, Diff𝛽 (X, Y, 𝑛) as the worst-case size of the difference in the results delivered by X with
respect to those due to Y, for any given cardinality of the input relation 𝑟. We have:

Theorem 5. Diff𝛽 (TST, STST, 𝑛) = Θ(𝑛); Diff𝛽 (STST, TST, 𝑛) = Θ(𝑛).

  From a practical point of view, Theorem 5 shows that there is no all-seasons minimal-transitive
semantics. Furthermore, there can be cases (used in the proof of the theorem) in which the
�                                                                        25
             2000




                                                    time for Best (s)
                                                                        20
             1500                                                                                        TST
                                             TST
                                                                        15
    |Best|


                                             STST                                                        STST
             1000
                                                                        10                               T
                                             T
              500                            ε                           5                               ε

                0                                                        0
                    average        median                                    average        median

                     (a) Cardinality of 𝛽.                                   (b) Time for computing 𝛽.
Figure 3: Computing 𝛽 with default parameter values.


number of best results from any of the two semantics is comparable to 𝑛, whereas the other
semantics returns 𝒪(1) tuples.
   For space reasons, we only give hints at our experimental results and refer the reader to [2]
for details. As for the rewriting of the input formula, in all cases we incur a low overhead,
negligible with respect to the time required for computation of 𝛽.
   In our experiments, as recognized in the germane literature [5], we only consider relevant
tuples, i.e., those that satisfy either side of at least one clause in the rewritten formula, since the
other tuples correspond to those objects that the formula does not talk about.
   The algorithm we adopt for computing 𝛽 is an improved version of BNL, exploiting a heuristics
tailored to our scenario, which pre-sorts the input relation so that the tuples matching the left
side of a clause and corresponding to more specific values in the taxonomies come first. The
rationale is that these values are likely associated with a smaller amount of tuples, so that a
smaller 𝛽 partial result can be found before scanning large amounts of data. Furthermore, such
tuples are likely to be preferred to many others, in particular when specificity is a concern.
   Besides testing real taxonomies, we experimented on a variety of synthetic datasets, generated
according to various taxonomy topologies (regular, random, and scale-free), dataset sizes (up to
1M tuples), attributes (up to 5), and input preferences (up to 10). Figure 3a shows our results
using default parameter values (regular taxonomy on 1 attribute, two conflicting preference
statements, 10K tuples). The amount of relevant tuples is roughly 40% of the size of the dataset.
Both T (i.e., only enforcing transitivity) and 𝜀 (i.e., no rewriting of the preference formula)
retain about half of the relevant tuples (which is both the average and the median value we
obtained), while TST and STST retain less than 2% in the median case (the average value goes
up to 20% due to runs with unfocused input formulas referring to values not in the dataset).
This is reflected in the computation times, shown in Figure 3b, which are consistently around
24𝑠 for T and 10𝑠 for 𝜀, but nearly two orders of magnitude smaller in the median case for TST
and STST.
   This confirms that our approach is effective both in reducing the cardinality of 𝛽 and in
achieving substantial speedup with respect to baseline strategies. Similar results are obtained
in all other scenarios we tested.
   We observe that the application of T alone corresponds to the work performed by preference
evaluation methods that only aim at guaranteeing transitivity, e.g., [3], which are therefore
outperformed by our approach. The inability of T to deal with conflicting preferences, thus
�generating many indifferent tuples, which in turn induce (very) large result sets, indeed applies
to all our scenarios. Similar observations apply to 𝜀 (i.e., the empty sequence, corresponding
to the input formula), which represents the action of works on preference evaluation using
no rewriting whatsoever, such as [1]. Additionally, the results obtained via 𝜀 would be totally
unreliable, due to lack of transitivity.


5. Conclusions
In this paper we have tackled the problem of finding the best elements from a repository on
the basis of preferences referring to values that are more generic than the underlying data
and may involve conflicts. To this aim, we have introduced and formally investigated two
operators for enforcing, in a given collection of preferences, the properties of specificity, which
can solve conflicts, and transitivity, which guarantees the soundness of the final result. We
have then characterized the limitations that can arise from their combination and identified
the best ways in which they can be used together. We have finally shown, with experiments
over both synthetic and real-world datasets, the effectiveness and practical feasibility of the
overall approach. We remark that the need to address conflicts arising from preferences was also
observed in [6, 7]. The framework proposed there allows for a restricted form of taxonomies [8]
(with all values organized into distinct, named levels) and hints at an ad hoc procedure with
very limited support for conflict resolution; the focus of [6, 7] is, however, on the downward
propagation of preferences.


References
[1] T. Lukasiewicz, M. V. Martinez, G. I. Simari, Preference-based query answering in datalog+/-
    ontologies, in: IJCAI 2013, pp. 1017–1023. URL: http://www.aaai.org/ocs/index.php/IJCAI/
    IJCAI13/paper/view/6505.
[2] P. Ciaccia, D. Martinenghi, R. Torlone,            Preference queries over taxonomic do-
    mains, Proc. VLDB Endow. 14 (2021) 1859–1871. URL: http://www.vldb.org/pvldb/vol14/
    p1859-martinenghi.pdf.
[3] J. Chomicki, Preference formulas in relational queries, ACM Trans. Database Syst. 28 (2003)
    427–466. URL: https://doi.org/10.1145/958942.958946. doi:10.1145/958942.958946.
[4] P. Ciaccia, D. Martinenghi, R. Torlone, Foundations of context-aware preference propagation,
    J. ACM 67 (2020) 4:1–4:43. URL: https://doi.org/10.1145/3375713. doi:10.1145/3375713.
[5] P. Georgiadis et al., Efficient rewriting algorithms for preference queries, in: ICDE,2008, pp.
    1101–1110. URL: https://doi.org/10.1109/ICDE.2008.4497519.
[6] P. Ciaccia, D. Martinenghi, R. Torlone, Finding preferred objects with taxonomies, in: ER,
    2019, pp. 397–411. URL: https://doi.org/10.1007/978-3-030-33223-5_33.
[7] P. Ciaccia, D. Martinenghi, R. Torlone, The POOR-MAD approach: Preferred objects over
    rich, multi-attribute data, in: SEBD, 2021, pp. 283–290. URL: http://ceur-ws.org/Vol-2994/
    paper30.pdf.
[8] D. Martinenghi, R. Torlone, Taxonomy-based relaxation of query answering in relational
    databases, VLDB J. 23 (2014) 747–769. URL: https://doi.org/10.1007/s00778-013-0350-x.
�