Difference between revisions of "Vol-3194/paper4"

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search
(edited by wikiedit)
 
(edited by wikiedit)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
+
=Paper=
 
{{Paper
 
{{Paper
 +
|id=Vol-3194/paper4
 +
|storemode=property
 +
|title=Describing Multidimensional Data Through Highlights
 +
|pdfUrl=https://ceur-ws.org/Vol-3194/paper4.pdf
 +
|volume=Vol-3194
 +
|authors=Matteo Francia,Enrico Gallinucci,Matteo Golfarelli,Patrick Marcel,Verónika Peralta,Stefano Rizzi
 +
|dblpUrl=https://dblp.org/rec/conf/sebd/FranciaGGMPR22
 
|wikidataid=Q117344947
 
|wikidataid=Q117344947
 
}}
 
}}
 +
==Describing Multidimensional Data Through Highlights==
 +
<pdf width="1500px">https://ceur-ws.org/Vol-3194/paper4.pdf</pdf>
 +
<pre>
 +
Describing Multidimensional Data
 +
Through Highlights
 +
(Discussion Paper)
 +
 +
Matteo Francia1 , Enrico Gallinucci1 , Matteo Golfarelli1 , Patrick Marcel2 ,
 +
Verónika Peralta2 and Stefano Rizzi1
 +
1
 +
    DISI, University of Bologna, Italy
 +
2
 +
    LIFAT, University of Tours, France
 +
 +
 +
                                        Abstract
 +
                                        The Intentional Analytics Model (IAM) is a new paradigm to couple OLAP and analytics. It relies on
 +
                                        two ideas: (i) letting the user explore data by expressing his/her analysis intentions rather than the data
 +
                                        (s)he needs, and (ii) returning enhanced cubes, i.e., multidimensional data annotated with knowledge
 +
                                        insights in the form of model components (e.g., clusters). In this paper we propose a proof-of-concept
 +
                                        for the IAM vision by delivering an end-to-end implementation of describe, one of the five intention
 +
                                        operators introduced by IAM.
 +
 +
                                        Keywords
 +
                                        OLAP, OLAM, Analytics, Multidimensional data, Data exploration
 +
 +
 +
 +
 +
1. Introduction
 +
Data warehousing and OLAP (On-Line Analytical Processing) have been progressively gaining
 +
a leading role in enabling business analyses over enterprise data since the early 90’s. Recently,
 +
it has become more and more evident that the OLAP paradigm, alone, is no more sufficient
 +
since the enormous success of machine learning techniques has consistently shifted the interest
 +
of corporate users towards sophisticated analytical applications.
 +
    The Intentional Analytics Model (IAM) has been envisioned as a way to tightly couple OLAP
 +
and analytics [1]. IAM relies on two major cornerstones: (i) the users explore the data space by
 +
expressing their analysis intentions rather than by explicitly stating what data they need, and
 +
(ii) in return they receive both multidimensional data and knowledge insights in the form of
 +
annotations of interesting subsets of data. As to (i), five intention operators have been proposed,
 +
namely, describe [2], assess [3], explain, predict, and suggest. As to (ii), first-class citizens of
 +
the IAM are enhanced cubes, defined as multidimensional cubes coupled with highlights, i.e.,
 +
sets of cube cells associated with interesting components of models automatically extracted
 +
from cubes [1]. An overview of the process is given in Figure 1.
 +
    The goal of this paper is to provide a proof-of-concept for the IAM vision by delivering an
 +
end-to-end implementation of the describe operator, which aims at describing one or more cube
 +
measures, possibly focused on one or more level members.
 +
 +
SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
 +
                                      © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 +
    CEUR
 +
    Workshop
 +
    Proceedings
 +
                  http://ceur-ws.org
 +
                  ISSN 1613-0073
 +
                                      CEUR Workshop Proceedings (CEUR-WS.org)
 +
�                                            data                      highlight        components model
 +
 +
            describe                                type      quantity
 +
              assess                          Bagels                48
 +
                explain                      Beer                116
 +
                    predict                    Bologna            192
 +
                      suggest                  Canned Fruit        138
 +
                                              Deli Meats          211
 +
                                              Fresh Chicken        64
 +
                                              Fresh Fruit        798
 +
                                              Frozen Chicken      237
 +
                                              Hamburger          141
 +
 +
 +
 +
 +
                                                                                          ad
 +
                                                                                          en
 +
                                                                                  ne er
 +
 +
 +
                                                                                  ot r
 +
 +
 +
                                                                                  M na
 +
 +
 +
 +
 +
                                                                                    sB n
 +
                                                                                    bu t
 +
 +
 +
 +
 +
                                                                                  es ne
 +
                                                                                  Bo o gs
 +
                                                                                  Ch el s
 +
 +
 +
 +
 +
                                                                            oz i M s
 +
                                                                                  Ch ats
 +
 +
 +
 +
 +
                                                                                            t
 +
                                                                                H rg e
 +
 +
 +
 +
 +
                                                                                          ui
 +
                                                                                am ui
 +
 +
 +
 +
 +
                                                                          Fr D el ffi n
 +
                                              Hot Dogs            154
 +
 +
 +
 +
 +
                                                                                ice k e
 +
                                                                                      Be
 +
 +
 +
 +
 +
                                                                                        re
 +
                                                                                        g
 +
                                                                                      ick
 +
 +
 +
 +
 +
                                                                              Fr Wi
 +
                                                                              H d Fr
 +
                                                                                        g
 +
 +
 +
 +
 +
                                                                              en e
 +
 +
 +
 +
 +
                                                                                      Fr
 +
                                                                                      D
 +
                                                                                      lo
 +
 +
 +
 +
 +
                                                                              Sl ic
 +
                                                                                    Ba
 +
 +
 +
 +
 +
                                                                                    u
 +
 +
 +
 +
 +
                                                                                    h
 +
                                              Muffins            205
 +
 +
 +
 +
 +
                                                                            sh
 +
 +
 +
                                                                                    n
 +
                                                                                  Ca
 +
                                                                              e
 +
                                                                          Fr
 +
                                              Slices Bread        266
 +
                                              Wine                448
 +
 +
                                                    enhanced cube
 +
 +
 +
 +
Figure 1: The IAM approach: the user expresses an intention and receives in return an enhanced cube
 +
 +
 +
Example 1. Let a SALES cube be given, and let the user’s intention be: with SALES describe
 +
quantity for month = ’1997-04’ by type using outliers. Firstly, the subset of cells for April 1997
 +
are selected from the SALES cube, aggregated by product type, and projected on measure quantity
 +
(in OLAP terms, a slice-and-dice and a roll-up operator are applied). Then, the outliers are found
 +
in these cells based on the values of quantity. Finally, a measure of interestingness is computed for
 +
the two components obtained (the outlier cells, and the non-outlier ones), and the cells belonging to
 +
the component with maximum interestingness (outlier cells) are highlighted in the results shown
 +
to the user (see Figure 1).                                                                        □
 +
 +
  After introducing a formalism to manipulate cubes and queries in Section 2, in Section 3
 +
we introduce models, components, and enhanced cubes. Then, in Section 4 we show how an
 +
intention is transformed into an execution plan, and in Section 5 we explain how enhanced cubes
 +
are visualized. Finally, in Section 6 we discuss the related literature and draw the conclusion.
 +
 +
 +
2. Formalities
 +
In this section we introduce the formal notations we will use in the paper to manipulate cubes.
 +
We start by defining cube schemata.
 +
 +
Definition 1 (Hierarchy and Cube Schema). A hierarchy is a couple ℎ = (𝐿ℎ , ⪰ℎ ) where:
 +
(i) (𝐿ℎ , ⪰ℎ ) is a roll-up total order of categorical levels; (ii) each level 𝑙 ∈ 𝐿ℎ is coupled with a do-
 +
main 𝐷𝑜𝑚(𝑙) including a set of members. The top level of ⪰ℎ is called dimension. A cube schema
 +
is a couple 𝒞 = (𝐻, 𝑀 ) where 𝐻 is a set of hierarchies and 𝑀 is a set of numerical measures,
 +
with each measure 𝑚 ∈ 𝑀 coupled with one aggregation operator 𝑜𝑝(𝑚) ∈ {sum, avg, . . .}.
 +
 +
Example 2. For our working example it is SALES = (𝐻, 𝑀 ), where 𝐻 = {ℎDate , ℎCustomer ,
 +
ℎProduct , ℎStore }, 𝑀 = {quantity, storeSales, storeCost}, date ⪰ month ⪰ year, customer ⪰
 +
gender, product ⪰ type ⪰ category, store ⪰ city ⪰ country, 𝑜𝑝(quantity) = 𝑜𝑝(storeSales) =
 +
𝑜𝑝(storeCost) = sum.                                                                      □
 +
 +
  Aggregation is the basic mechanism to query cubes, and it is captured by the following
 +
definition of group-by set.
 +
�Definition 2 (Group-by Set and Coordinate). Given cube schema 𝒞 = (𝐻, 𝑀 ), a group-by
 +
set 𝐺 of 𝒞 is a set of levels, at most one from each hierarchy of 𝐻. A coordinate of a group-by set
 +
𝐺 is a tuple of members, one for each level of 𝐺.
 +
 +
Example 3. Two group-by sets of SALES are 𝐺1 = {date, type, country} and 𝐺2 = {month,
 +
category}. Example of coordinates of these group-by sets are, respectively, 𝛾1 = ⟨1997-04-15,
 +
Fresh Fruit, Italy⟩ and 𝛾2 = ⟨1997-04, Fruit⟩.                                            □
 +
 +
  The instances of a cube schema are called cubes and are defined as follows:
 +
 +
Definition 3 (Cube). A cube over 𝒞 is a tuple 𝐶 = (𝐺𝐶 , 𝑀𝐶 , 𝜔𝐶 ) where: (i) 𝐺𝐶 is a group-
 +
by set of 𝒞; (ii) 𝑀𝐶 ⊆ 𝑀 ; (iii) 𝜔𝐶 is a partial function that maps some coordinates of 𝐺𝐶 to a
 +
numerical value for each measure 𝑚 ∈ 𝑀𝐶 .
 +
 +
Each coordinate 𝛾 that participates in 𝜔0 , with its associated tuple 𝑡 of measure values, is called a
 +
cell of 𝐶 and denoted ⟨𝛾, 𝑡⟩. A cube whose group-by set 𝐺𝐶 includes all and only the dimensions
 +
of the hierarchies in 𝐻 and such that 𝑀𝐶 = 𝑀 , is called a base cube, the others are called
 +
derived cubes. In OLAP terms, a derived cube is the result of either a roll-up, a slice-and-dice, or
 +
a projection made over a base cube; this is formalized as follows.
 +
 +
Definition 4 (Cube Query). A query over cube schema 𝒞 is a triple 𝑞 = (𝐺𝑞 , 𝑃𝑞 , 𝑀𝑞 ) where:
 +
(i) 𝐺𝑞 is a group-by set of 𝐻; (ii) 𝑃𝑞 is a (possibly empty) set of selection predicates, each expressed
 +
over one level of 𝐻; (iii) 𝑀𝑞 ⊆ 𝑀 .
 +
 +
Example 4. The cube query over SALES used in Example 1 is 𝑞 = (𝐺𝑞 , 𝑃𝑞 , 𝑀𝑞 ) where 𝐺𝑞 =
 +
{type}, 𝑃𝑞 = {month = ’1997-04’}, and 𝑀𝑞 = {quantity}. A cell of the resulting cube
 +
𝑞(SALES0 ) (where SALES0 is the base cube) is ⟨Canned Fruit⟩ with associated value 138 for quan-
 +
tity.                                                                                        □
 +
 +
 +
3. Enhancing cubes with models
 +
Models are concise, information-rich knowledge artifacts [4] that represent relationships hiding
 +
in the cube cells. The possible models range from simple functions and measure correlations
 +
to more elaborate techniques such as decision trees, clusterings, etc. A model is bound to
 +
(i.e., is computed over the levels/measures of) one cube, and is made of a set of components
 +
(e.g., a clustering model is made of a set of clusters). In the IAM, a relevant role is taken by
 +
data-to-model mappings. Indeed, a model partitions the cube on which it is computed into two
 +
or more subsets of cells, one for each component (e.g., the subsets of cells belonging to each
 +
cluster).
 +
 +
Definition 5 (Model and Component). A model is a tuple ℳ = (𝑡, 𝑎𝑙𝑔, 𝐶, 𝐼𝑛, 𝑂𝑢𝑡, 𝜇) where:
 +
 +
  (i) 𝑡 is the model type;
 +
 +
(ii) 𝑎𝑙𝑔 is the algorithm used to compute 𝑂𝑢𝑡;
 +
�(iii) 𝐶 is the cube to which ℳ is bound;
 +
 +
(iv) 𝐼𝑛 is the tuple of levels/measures of 𝒞 and parameter values supplied to 𝑎𝑙𝑔 to compute ℳ;
 +
 +
(v) 𝑂𝑢𝑡 is the set of components that make up 𝑂𝑢𝑡;
 +
 +
(vi) 𝜇 is a function mapping each coordinate of 𝐶 to one component of 𝑂𝑢𝑡.
 +
 +
Each model component is a tuple of a component identifier plus a variable number of properties
 +
that describe that component.
 +
 +
  In the scope of this work, it is 𝑡 ∈ {top-k, bottom-k, skyline, outliers, clustering}. For in-
 +
stance, for 𝑡 = clustering, each component is a cluster and is described by its centroid.
 +
 +
Example 5. A possible model over the derived cube 𝑞(SALES0 ) in Example 4 is characterized
 +
by 𝑡 = clustering, 𝑎𝑙𝑔 = K-Means, 𝐶 = 𝑞(SALES0 ), 𝐼𝑛 = ⟨quantity, 𝑛 = 3, 𝑟𝑛𝑑𝑆𝑒𝑒𝑑 = 0⟩,
 +
𝑂𝑢𝑡 = {𝑐1, 𝑐2, 𝑐3}, 𝜇(⟨Bagels⟩) = 𝑐1, 𝜇(⟨Beer⟩) = 𝑐1, 𝜇(⟨Bologna⟩) = 𝑐2, . . ., where 𝑛 is
 +
the desired number of clusters and 𝑟𝑛𝑑𝑆𝑒𝑒𝑑 is the seed to be used by the k-means algorithm to
 +
randomly generate the 3 seed clusters. Component 𝑐1 is characterized by property 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 with
 +
value 76.                                                                                  □
 +
 +
  As the last step in the IAM approach, cube 𝐶 is enhanced by associating it with a set of
 +
models bound to 𝐶 and with a highlight, i.e., with the subset of cells corresponding to the most
 +
interesting component of the model; these cells are determined via function 𝜇.
 +
 +
Definition 6. An enhanced cube 𝐸 is a triple of a cube 𝐶, a set of models {ℳ1 , . . . , ℳ𝑟 } bound
 +
to 𝐶, and a highlight 𝑐ℎ𝑖𝑔ℎ = 𝑎𝑟𝑔𝑚𝑎𝑥{𝑐∈⋃︀𝑟𝑖=1 𝑂𝑢𝑡𝑖 } (𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡(𝑐)).
 +
 +
  How to estimate the interestingness of component 𝑐, 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡(𝑐), is explained in detail in [2].
 +
Here we just mention that we consider three facets of interestingness identified in [5], namely,
 +
novelty, peculiarity, and surprise.
 +
 +
 +
4. Execution plans for describe intentions
 +
The describe operator provides an answer to the user asking “show me my business” by describ-
 +
ing one or more cube measures, possibly focused on one or more level members, at some given
 +
granularity [1]. The cube is enhanced by showing either the top/bottom-k cells, the skyline, the
 +
outliers, or clusters of cells. Let 𝐶0 be a base cube over cube schema 𝒞 = (𝐻, 𝑀 ); the syntax
 +
for describe is
 +
 +
                    with 𝐶0 describe 𝑚1 , . . . , 𝑚𝑧 [ for 𝑃 ] [ by 𝑙1 , . . . , 𝑙𝑛 ]
 +
                        [ using 𝑡1 [ size 𝑘1 ], . . . , 𝑡𝑟 [ size 𝑘𝑟 ]]
 +
 +
(optional parts are in brackets) where 𝑚1 , . . . , 𝑚𝑧 ∈ 𝑀 are measures of 𝒞, 𝑃 is a set of selection
 +
predicates each over one level of 𝐻, {𝑙1 , . . . , 𝑙𝑛 } denote a group-by set of 𝐻, 𝑡1 , . . . , 𝑡𝑟 are
 +
�model types, and the 𝑘𝑖 ’s are the desired sizes to be applied to the models returned as explained
 +
in point 2 below.
 +
  The plan corresponding to a fully-specified intention, i.e., one where all optional clauses have
 +
been specified, is:
 +
1. Execute query 𝑞 = (𝐺𝑞 , 𝑃𝑞 , 𝑀𝑞 ), where 𝐺𝑞 = {𝑙1 , . . . , 𝑙𝑛 }, 𝑃𝑞 = 𝑃 , and 𝑀𝑞 = {𝑚1 , . . . , 𝑚𝑧 }.
 +
  Let 𝐶 = 𝑞(𝐶0 ).
 +
 +
2. For 1 ≤ 𝑖 ≤ 𝑟, compute model ℳ𝑖 = (𝑡𝑖 , 𝑎𝑙𝑔𝑖 , 𝐶, 𝐼𝑛𝑖 , 𝑂𝑢𝑡𝑖 , 𝜇𝑖 ) and for each 𝑐 ∈ 𝑂𝑢𝑡𝑖 ,
 +
  compute 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡(𝑐). Size 𝑘𝑖 is used for clustering to determine the number of clusters to
 +
  be computed, for top-k and bottom-k to determine the number of cells to be returned, for
 +
  outliers to determine the number of outliers; it is neglected for the skyline.
 +
 +
3. Find the highlight 𝑐ℎ𝑖𝑔ℎ = 𝑎𝑟𝑔𝑚𝑎𝑥{𝑐∈⋃︀𝑖 𝑂𝑢𝑡𝑖 } (𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡(𝑐)).
 +
 +
4. Return the enhanced cube 𝐸 consisting of 𝐶, {ℳ1 , . . . ℳ𝑟 }, and 𝑐ℎ𝑖𝑔ℎ .
 +
 +
  Partially-specified intentions are interpreted as follows:
 +
    • If the for clause has not been specified, we consider 𝑃𝑞 = 𝑇 𝑅𝑈 𝐸.
 +
    • If the by clause has not been specified, we consider 𝐺𝑞 = ∅.
 +
    • If the using 𝑡1 , . . . , 𝑡𝑟 clause has not been specified, all model types listed in Section 3 are
 +
      computed over 𝐶 (the skyline is computed only if 𝑧 > 1, i.e., at least two measures have
 +
      been specified).
 +
    • If the size clause has not been specified for one or more models, the value of 𝑘𝑖 is
 +
      determined automatically through the Elbow method.
 +
 +
Example 6. Consider the following session on the SALES cube:
 +
 +
              with SALES describe quantity for month = ’1997-04’ by type
 +
              with SALES describe quantity by category using clustering size 3
 +
 +
The models computed for the first intention are top-k, bottom-k, clustering, and outliers (computing
 +
the skyline for a single measure makes no sense). For the second intention, a clustering producing
 +
3 clusters is computed.                                                                          □
 +
 +
 +
5. Visualizing enhanced cubes
 +
To provide an effective description of an enhanced cube we couple text-based and graphical
 +
representations with an ad-hoc interaction paradigm. Specifically, the visualization includes
 +
three distinct but inter-related areas: a table area that shows the cube cells using a pivot table; a
 +
chart area that complements the table area by representing the cube cells through one or more
 +
charts; a component area that shows a list of model components sorted by their interestingness.
 +
The guidelines adopted to select the charts are detailed in [2]. The interaction paradigm we
 +
adopt is component-driven. Specifically, clicking on one component 𝑐 in the component area
 +
leads to emphasize the corresponding cube cells (i.e., those that map to 𝑐 via function 𝜇) both
 +
�Figure 2: The visualization obtained for the intention in Example 7
 +
 +
 +
in the table area and in the chart area. The highlight is the top component in the list and is
 +
selected by default. Following the details-on-demand paradigm [6], interaction is enhanced
 +
using a tooltip that, when the mouse is positioned on a data point, shows its coordinate, its
 +
measure value(s), and the component(s) it belongs to.
 +
 +
Example 7. Figure 2 shows the visualization obtained when the following intention is formulated:
 +
with SALES describe storeCost by month, category. On the top-left, the table area; on the right,
 +
the chart area; on the bottom-left, the component area. Here a heat map and a bubble chart have
 +
been selected. The top-interestingness component is a cluster, so a color has been assigned to each
 +
component of clustering (i.e., to each cluster) and is uniformly used in all three areas. The highlight
 +
(in green) is currently selected and is emphasized using a thicker border in all areas. A tooltip with
 +
all the details about a single cell is also shown (in yellow).                                      □
 +
 +
 +
6. Related work and conclusion
 +
The idea of coupling data and analytical models was born in the 90’s with inductive databases,
 +
where data were coupled with patterns meant as generalizations of the data. Later on, data-to-
 +
model unification was addressed in MauveDB [7], which provides a language for specifying
 +
model-based views of data using common statistical models.
 +
�  The coupling of the OLAP paradigm and data mining to create an approach where concise
 +
patterns are extracted from multidimensional data for user’s evaluation, was the goal of some
 +
approaches commonly labeled as OLAM [8]. In this context, k-means clustering is used by [9] to
 +
dynamically create semantically-rich aggregates of facts other than those statically provided by
 +
dimension hierarchies. Similarly, the shrink operator is proposed by [10] to compute small-size
 +
approximations of a cube via agglomerative clustering. Other operators that enrich data with
 +
knowledge extraction results are DIFF [11], which returns a set of tuples that most successfully
 +
describe the difference of values between two cells of a cube, and RELAX [12], which verifies
 +
whether a pattern observed at a certain level of detail is also present at a coarser level of detail,
 +
too. Finally, [13] reuse the OLAP paradigm to explore prediction cubes, i.e., cubes where each
 +
cell summarizes a predictive model trained on the data corresponding to that cell. The IAM
 +
approach can be regarded as OLAM since, like the approaches mentioned above, it relies on
 +
mining techniques to enhance the cube resulting from an OLAP query. However, while each of
 +
the approaches above uses one single technique (e.g., clustering) to this end, the IAM leans on
 +
multiple mining techniques to give users a wider variety of insights, using the interestingness
 +
measure to select the most relevant ones.
 +
  To the best of our knowledge, though some tools (e.g., Spotfire and Tableau) integrate OLAP
 +
and analytics capabilities in the same environment, none of them allows users to formulate
 +
queries at a higher level of abstraction than OLAP (as done in the IAM using intentions), nor they
 +
support the automated out-of-the-box enrichment of cubes with insights obtained by analytics
 +
(as done in the IAM through enhanced cubes).
 +
  In this paper we have given a proof-of-concept for the IAM vision by delivering an imple-
 +
mentation of the describe operator, relying on a visual metaphor to display enhanced cubes.
 +
Our implementation uses a simple multidimensional engine [14, 15] that relies on the Oracle
 +
11g DBMS to execute queries on a star schema; the mining models are imported from the
 +
Scikit-Learn Python library. The web-based visualization is implemented in JavaScript and uses
 +
the D3 library. The prototype can be accessed at http://semantic.csr.unibo.it/describe/.
 +
  In [2], we have showed that our approach diminishes the effort for formulating complex
 +
analyses while ensuring that performances are compatible with near-real-time requirements of
 +
interactive sessions. Specifically, using the ASCII character length as an approximation for the
 +
effort it takes to craft a query, we evaluated the saving in user’s effort when writing a describe
 +
intention over the one necessary to obtain the same result using plain SQL and Python. We
 +
considered a simple session including three intentions, where the by clause is progressively
 +
enlarged and all the models are computed. Remarkably, it turned out that the total formulation
 +
effort using SQL+Python is about two orders of magnitude larger than using describe intentions
 +
(in the average, about 5400 vs. 55 chars). For the efficiency test we used the FoodMart data
 +
(github.com/julianhyde/foodmart-data-mysql) and the same session mentioned above. Table 1
 +
shows the total execution time and its breakdown into the times necessary to query the base
 +
cube, to compute the models, to measure the interestingness, and to generate the pivot table
 +
returned to the browser. Remarkably, it turns out that at most 18 seconds are necessary to
 +
retrieve and visualize an enhanced cube of more than 86000 cells, which is perfectly compatible
 +
with the execution time of a standard OLAP query.
 +
  The main directions for future research we wish to pursue are: (i) evaluate the usability of
 +
the approach by conducting tests with real users, and (ii) extend the approach to operate with
 +
�Table 1
 +
Execution times in seconds for three intentions with increasing cardinalities of 𝐶 (the tests were run
 +
on an Intel Core(TM)i7-6700 CPU@3.40GHz with 8GB RAM)
 +
                    Intention    |𝐶|    Query  Model  Interestingness  Pivot  Total
 +
                        𝐼1        323    0.10    0.25        0.00        0.00    0.36
 +
                        𝐼2      20525    0.22    5.90        0.36        0.36    6.83
 +
                        𝐼3      86832    0.22    8.50        7.43        1.72    17.87
 +
 +
 +
 +
dashboards of enhanced cubes.
 +
 +
 +
References
 +
[1] P. Vassiliadis, P. Marcel, S. Rizzi, Beyond roll-up’s and drill-down’s: An intentional analytics
 +
    model to reinvent OLAP, Inf. Sys. 85 (2019) 68–91.
 +
[2] M. Francia, P. Marcel, V. Peralta, S. Rizzi, Enhancing cubes with models to describe
 +
    multidimensional data, Inf. Sys. Frontiers 24 (2022) 31–48.
 +
[3] M. Francia, M. Golfarelli, P. Marcel, S. Rizzi, P. Vassiliadis, Assess queries for interactive
 +
    analysis of data cubes, in: Proc. of EDBT, 2021, pp. 121–132.
 +
[4] M. Terrovitis, P. Vassiliadis, S. Skiadopoulos, E. Bertino, B. Catania, A. Maddalena, S. Rizzi,
 +
    Modeling and language support for the management of pattern-bases, Data Knowl. Eng.
 +
    62 (2007) 368–397.
 +
[5] P. Marcel, V. Peralta, P. Vassiliadis, A framework for learning cell interestingness from
 +
    cube explorations, in: Proc. of ADBIS, 2019.
 +
[6] B. Shneiderman, The eyes have it: A task by data type taxonomy for information visual-
 +
    izations, in: Proc. of IEEE Symp. on Visual Languages, 1996, pp. 336–343.
 +
[7] A. Deshpande, S. Madden, MauveDB: supporting model-based user views in database
 +
    systems, in: Proc. of SIGMOD, 2006, pp. 73–84.
 +
[8] J. Han, OLAP mining: Integration of OLAP with data mining, in: Proc. of Working Conf.
 +
    on Database Semantics, 1997, pp. 3–20.
 +
[9] F. Bentayeb, C. Favre, RoK: Roll-up with the k-means clustering method for recommending
 +
    OLAP queries, in: Proc. of DEXA, 2009, pp. 501–515.
 +
[10] M. Golfarelli, S. Graziani, S. Rizzi, Shrink: An OLAP operation for balancing precision and
 +
    size of pivot tables, Data Knowl. Eng. 93 (2014) 19–41.
 +
[11] S. Sarawagi, Explaining differences in multidimensional aggregates, in: Proc. of VLDB,
 +
    1999, pp. 42–53.
 +
[12] G. Sathe, S. Sarawagi, Intelligent rollups in multidimensional OLAP data, in: Proc. of
 +
    VLDB, 2001, pp. 531–540.
 +
[13] B. Chen, L. Chen, Y. Lin, R. Ramakrishnan, Prediction cubes, in: Proc. of VLDB, 2005, pp.
 +
    982–993.
 +
[14] M. Francia, E. Gallinucci, M. Golfarelli, Towards conversational OLAP, in: Proc. of DOLAP,
 +
    2020, pp. 6–15.
 +
[15] M. Francia, E. Gallinucci, M. Golfarelli, COOL: A framework for conversational OLAP, Inf.
 +
    Syst. 104 (2022) 101752.
 +
 +
</pre>

Latest revision as of 17:58, 30 March 2023

Paper

Paper
edit
description  
id  Vol-3194/paper4
wikidataid  Q117344947→Q117344947
title  Describing Multidimensional Data Through Highlights
pdfUrl  https://ceur-ws.org/Vol-3194/paper4.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/FranciaGGMPR22
volume  Vol-3194→Vol-3194
session  →

Describing Multidimensional Data Through Highlights

load PDF

Describing Multidimensional Data
Through Highlights
(Discussion Paper)

Matteo Francia1 , Enrico Gallinucci1 , Matteo Golfarelli1 , Patrick Marcel2 ,
Verónika Peralta2 and Stefano Rizzi1
1
    DISI, University of Bologna, Italy
2
    LIFAT, University of Tours, France


                                         Abstract
                                         The Intentional Analytics Model (IAM) is a new paradigm to couple OLAP and analytics. It relies on
                                         two ideas: (i) letting the user explore data by expressing his/her analysis intentions rather than the data
                                         (s)he needs, and (ii) returning enhanced cubes, i.e., multidimensional data annotated with knowledge
                                         insights in the form of model components (e.g., clusters). In this paper we propose a proof-of-concept
                                         for the IAM vision by delivering an end-to-end implementation of describe, one of the five intention
                                         operators introduced by IAM.

                                         Keywords
                                         OLAP, OLAM, Analytics, Multidimensional data, Data exploration




1. Introduction
Data warehousing and OLAP (On-Line Analytical Processing) have been progressively gaining
a leading role in enabling business analyses over enterprise data since the early 90’s. Recently,
it has become more and more evident that the OLAP paradigm, alone, is no more sufficient
since the enormous success of machine learning techniques has consistently shifted the interest
of corporate users towards sophisticated analytical applications.
    The Intentional Analytics Model (IAM) has been envisioned as a way to tightly couple OLAP
and analytics [1]. IAM relies on two major cornerstones: (i) the users explore the data space by
expressing their analysis intentions rather than by explicitly stating what data they need, and
(ii) in return they receive both multidimensional data and knowledge insights in the form of
annotations of interesting subsets of data. As to (i), five intention operators have been proposed,
namely, describe [2], assess [3], explain, predict, and suggest. As to (ii), first-class citizens of
the IAM are enhanced cubes, defined as multidimensional cubes coupled with highlights, i.e.,
sets of cube cells associated with interesting components of models automatically extracted
from cubes [1]. An overview of the process is given in Figure 1.
    The goal of this paper is to provide a proof-of-concept for the IAM vision by delivering an
end-to-end implementation of the describe operator, which aims at describing one or more cube
measures, possibly focused on one or more level members.

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
�                                             data                      highlight         components model

             describe                                 type      quantity
               assess                          Bagels                48
                 explain                       Beer                116
                    predict                    Bologna             192
                      suggest                  Canned Fruit        138
                                               Deli Meats          211
                                               Fresh Chicken         64
                                               Fresh Fruit         798
                                               Frozen Chicken      237
                                               Hamburger           141




                                                                                           ad
                                                                                           en
                                                                                  ne er


                                                                                   ot r


                                                                                   M na




                                                                                    sB n
                                                                                     bu t




                                                                                  es ne
                                                                                  Bo o gs
                                                                                   Ch el s




                                                                             oz i M s
                                                                                   Ch ats




                                                                                            t
                                                                                 H rg e




                                                                                          ui
                                                                                 am ui




                                                                           Fr D el ffi n
                                               Hot Dogs            154




                                                                                ice k e
                                                                                       Be




                                                                                        re
                                                                                         g
                                                                                      ick




                                                                               Fr Wi
                                                                               H d Fr
                                                                                        g




                                                                               en e




                                                                                       Fr
                                                                                      D
                                                                                      lo




                                                                              Sl ic
                                                                                    Ba




                                                                                     u




                                                                                    h
                                               Muffins             205




                                                                             sh


                                                                                     n
                                                                                  Ca
                                                                              e
                                                                           Fr
                                               Slices Bread        266
                                               Wine                448

                                                    enhanced cube



Figure 1: The IAM approach: the user expresses an intention and receives in return an enhanced cube


Example 1. Let a SALES cube be given, and let the user’s intention be: with SALES describe
quantity for month = ’1997-04’ by type using outliers. Firstly, the subset of cells for April 1997
are selected from the SALES cube, aggregated by product type, and projected on measure quantity
(in OLAP terms, a slice-and-dice and a roll-up operator are applied). Then, the outliers are found
in these cells based on the values of quantity. Finally, a measure of interestingness is computed for
the two components obtained (the outlier cells, and the non-outlier ones), and the cells belonging to
the component with maximum interestingness (outlier cells) are highlighted in the results shown
to the user (see Figure 1).                                                                        □

   After introducing a formalism to manipulate cubes and queries in Section 2, in Section 3
we introduce models, components, and enhanced cubes. Then, in Section 4 we show how an
intention is transformed into an execution plan, and in Section 5 we explain how enhanced cubes
are visualized. Finally, in Section 6 we discuss the related literature and draw the conclusion.


2. Formalities
In this section we introduce the formal notations we will use in the paper to manipulate cubes.
We start by defining cube schemata.

Definition 1 (Hierarchy and Cube Schema). A hierarchy is a couple ℎ = (𝐿ℎ , ⪰ℎ ) where:
(i) (𝐿ℎ , ⪰ℎ ) is a roll-up total order of categorical levels; (ii) each level 𝑙 ∈ 𝐿ℎ is coupled with a do-
main 𝐷𝑜𝑚(𝑙) including a set of members. The top level of ⪰ℎ is called dimension. A cube schema
is a couple 𝒞 = (𝐻, 𝑀 ) where 𝐻 is a set of hierarchies and 𝑀 is a set of numerical measures,
with each measure 𝑚 ∈ 𝑀 coupled with one aggregation operator 𝑜𝑝(𝑚) ∈ {sum, avg, . . .}.

Example 2. For our working example it is SALES = (𝐻, 𝑀 ), where 𝐻 = {ℎDate , ℎCustomer ,
ℎProduct , ℎStore }, 𝑀 = {quantity, storeSales, storeCost}, date ⪰ month ⪰ year, customer ⪰
gender, product ⪰ type ⪰ category, store ⪰ city ⪰ country, 𝑜𝑝(quantity) = 𝑜𝑝(storeSales) =
𝑜𝑝(storeCost) = sum.                                                                      □

  Aggregation is the basic mechanism to query cubes, and it is captured by the following
definition of group-by set.
�Definition 2 (Group-by Set and Coordinate). Given cube schema 𝒞 = (𝐻, 𝑀 ), a group-by
set 𝐺 of 𝒞 is a set of levels, at most one from each hierarchy of 𝐻. A coordinate of a group-by set
𝐺 is a tuple of members, one for each level of 𝐺.

Example 3. Two group-by sets of SALES are 𝐺1 = {date, type, country} and 𝐺2 = {month,
category}. Example of coordinates of these group-by sets are, respectively, 𝛾1 = ⟨1997-04-15,
Fresh Fruit, Italy⟩ and 𝛾2 = ⟨1997-04, Fruit⟩.                                             □

  The instances of a cube schema are called cubes and are defined as follows:

Definition 3 (Cube). A cube over 𝒞 is a tuple 𝐶 = (𝐺𝐶 , 𝑀𝐶 , 𝜔𝐶 ) where: (i) 𝐺𝐶 is a group-
by set of 𝒞; (ii) 𝑀𝐶 ⊆ 𝑀 ; (iii) 𝜔𝐶 is a partial function that maps some coordinates of 𝐺𝐶 to a
numerical value for each measure 𝑚 ∈ 𝑀𝐶 .

Each coordinate 𝛾 that participates in 𝜔0 , with its associated tuple 𝑡 of measure values, is called a
cell of 𝐶 and denoted ⟨𝛾, 𝑡⟩. A cube whose group-by set 𝐺𝐶 includes all and only the dimensions
of the hierarchies in 𝐻 and such that 𝑀𝐶 = 𝑀 , is called a base cube, the others are called
derived cubes. In OLAP terms, a derived cube is the result of either a roll-up, a slice-and-dice, or
a projection made over a base cube; this is formalized as follows.

Definition 4 (Cube Query). A query over cube schema 𝒞 is a triple 𝑞 = (𝐺𝑞 , 𝑃𝑞 , 𝑀𝑞 ) where:
(i) 𝐺𝑞 is a group-by set of 𝐻; (ii) 𝑃𝑞 is a (possibly empty) set of selection predicates, each expressed
over one level of 𝐻; (iii) 𝑀𝑞 ⊆ 𝑀 .

Example 4. The cube query over SALES used in Example 1 is 𝑞 = (𝐺𝑞 , 𝑃𝑞 , 𝑀𝑞 ) where 𝐺𝑞 =
{type}, 𝑃𝑞 = {month = ’1997-04’}, and 𝑀𝑞 = {quantity}. A cell of the resulting cube
𝑞(SALES0 ) (where SALES0 is the base cube) is ⟨Canned Fruit⟩ with associated value 138 for quan-
tity.                                                                                         □


3. Enhancing cubes with models
Models are concise, information-rich knowledge artifacts [4] that represent relationships hiding
in the cube cells. The possible models range from simple functions and measure correlations
to more elaborate techniques such as decision trees, clusterings, etc. A model is bound to
(i.e., is computed over the levels/measures of) one cube, and is made of a set of components
(e.g., a clustering model is made of a set of clusters). In the IAM, a relevant role is taken by
data-to-model mappings. Indeed, a model partitions the cube on which it is computed into two
or more subsets of cells, one for each component (e.g., the subsets of cells belonging to each
cluster).

Definition 5 (Model and Component). A model is a tuple ℳ = (𝑡, 𝑎𝑙𝑔, 𝐶, 𝐼𝑛, 𝑂𝑢𝑡, 𝜇) where:

  (i) 𝑡 is the model type;

 (ii) 𝑎𝑙𝑔 is the algorithm used to compute 𝑂𝑢𝑡;
�(iii) 𝐶 is the cube to which ℳ is bound;

(iv) 𝐼𝑛 is the tuple of levels/measures of 𝒞 and parameter values supplied to 𝑎𝑙𝑔 to compute ℳ;

 (v) 𝑂𝑢𝑡 is the set of components that make up 𝑂𝑢𝑡;

(vi) 𝜇 is a function mapping each coordinate of 𝐶 to one component of 𝑂𝑢𝑡.

Each model component is a tuple of a component identifier plus a variable number of properties
that describe that component.

   In the scope of this work, it is 𝑡 ∈ {top-k, bottom-k, skyline, outliers, clustering}. For in-
stance, for 𝑡 = clustering, each component is a cluster and is described by its centroid.

Example 5. A possible model over the derived cube 𝑞(SALES0 ) in Example 4 is characterized
by 𝑡 = clustering, 𝑎𝑙𝑔 = K-Means, 𝐶 = 𝑞(SALES0 ), 𝐼𝑛 = ⟨quantity, 𝑛 = 3, 𝑟𝑛𝑑𝑆𝑒𝑒𝑑 = 0⟩,
𝑂𝑢𝑡 = {𝑐1, 𝑐2, 𝑐3}, 𝜇(⟨Bagels⟩) = 𝑐1, 𝜇(⟨Beer⟩) = 𝑐1, 𝜇(⟨Bologna⟩) = 𝑐2, . . ., where 𝑛 is
the desired number of clusters and 𝑟𝑛𝑑𝑆𝑒𝑒𝑑 is the seed to be used by the k-means algorithm to
randomly generate the 3 seed clusters. Component 𝑐1 is characterized by property 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 with
value 76.                                                                                   □

   As the last step in the IAM approach, cube 𝐶 is enhanced by associating it with a set of
models bound to 𝐶 and with a highlight, i.e., with the subset of cells corresponding to the most
interesting component of the model; these cells are determined via function 𝜇.

Definition 6. An enhanced cube 𝐸 is a triple of a cube 𝐶, a set of models {ℳ1 , . . . , ℳ𝑟 } bound
to 𝐶, and a highlight 𝑐ℎ𝑖𝑔ℎ = 𝑎𝑟𝑔𝑚𝑎𝑥{𝑐∈⋃︀𝑟𝑖=1 𝑂𝑢𝑡𝑖 } (𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡(𝑐)).

  How to estimate the interestingness of component 𝑐, 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡(𝑐), is explained in detail in [2].
Here we just mention that we consider three facets of interestingness identified in [5], namely,
novelty, peculiarity, and surprise.


4. Execution plans for describe intentions
The describe operator provides an answer to the user asking “show me my business” by describ-
ing one or more cube measures, possibly focused on one or more level members, at some given
granularity [1]. The cube is enhanced by showing either the top/bottom-k cells, the skyline, the
outliers, or clusters of cells. Let 𝐶0 be a base cube over cube schema 𝒞 = (𝐻, 𝑀 ); the syntax
for describe is

                     with 𝐶0 describe 𝑚1 , . . . , 𝑚𝑧 [ for 𝑃 ] [ by 𝑙1 , . . . , 𝑙𝑛 ]
                        [ using 𝑡1 [ size 𝑘1 ], . . . , 𝑡𝑟 [ size 𝑘𝑟 ]]

(optional parts are in brackets) where 𝑚1 , . . . , 𝑚𝑧 ∈ 𝑀 are measures of 𝒞, 𝑃 is a set of selection
predicates each over one level of 𝐻, {𝑙1 , . . . , 𝑙𝑛 } denote a group-by set of 𝐻, 𝑡1 , . . . , 𝑡𝑟 are
�model types, and the 𝑘𝑖 ’s are the desired sizes to be applied to the models returned as explained
in point 2 below.
   The plan corresponding to a fully-specified intention, i.e., one where all optional clauses have
been specified, is:
1. Execute query 𝑞 = (𝐺𝑞 , 𝑃𝑞 , 𝑀𝑞 ), where 𝐺𝑞 = {𝑙1 , . . . , 𝑙𝑛 }, 𝑃𝑞 = 𝑃 , and 𝑀𝑞 = {𝑚1 , . . . , 𝑚𝑧 }.
   Let 𝐶 = 𝑞(𝐶0 ).

2. For 1 ≤ 𝑖 ≤ 𝑟, compute model ℳ𝑖 = (𝑡𝑖 , 𝑎𝑙𝑔𝑖 , 𝐶, 𝐼𝑛𝑖 , 𝑂𝑢𝑡𝑖 , 𝜇𝑖 ) and for each 𝑐 ∈ 𝑂𝑢𝑡𝑖 ,
   compute 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡(𝑐). Size 𝑘𝑖 is used for clustering to determine the number of clusters to
   be computed, for top-k and bottom-k to determine the number of cells to be returned, for
   outliers to determine the number of outliers; it is neglected for the skyline.

3. Find the highlight 𝑐ℎ𝑖𝑔ℎ = 𝑎𝑟𝑔𝑚𝑎𝑥{𝑐∈⋃︀𝑖 𝑂𝑢𝑡𝑖 } (𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡(𝑐)).

4. Return the enhanced cube 𝐸 consisting of 𝐶, {ℳ1 , . . . ℳ𝑟 }, and 𝑐ℎ𝑖𝑔ℎ .

  Partially-specified intentions are interpreted as follows:
    • If the for clause has not been specified, we consider 𝑃𝑞 = 𝑇 𝑅𝑈 𝐸.
    • If the by clause has not been specified, we consider 𝐺𝑞 = ∅.
    • If the using 𝑡1 , . . . , 𝑡𝑟 clause has not been specified, all model types listed in Section 3 are
      computed over 𝐶 (the skyline is computed only if 𝑧 > 1, i.e., at least two measures have
      been specified).
    • If the size clause has not been specified for one or more models, the value of 𝑘𝑖 is
      determined automatically through the Elbow method.

Example 6. Consider the following session on the SALES cube:

               with SALES describe quantity for month = ’1997-04’ by type
               with SALES describe quantity by category using clustering size 3

The models computed for the first intention are top-k, bottom-k, clustering, and outliers (computing
the skyline for a single measure makes no sense). For the second intention, a clustering producing
3 clusters is computed.                                                                           □


5. Visualizing enhanced cubes
To provide an effective description of an enhanced cube we couple text-based and graphical
representations with an ad-hoc interaction paradigm. Specifically, the visualization includes
three distinct but inter-related areas: a table area that shows the cube cells using a pivot table; a
chart area that complements the table area by representing the cube cells through one or more
charts; a component area that shows a list of model components sorted by their interestingness.
The guidelines adopted to select the charts are detailed in [2]. The interaction paradigm we
adopt is component-driven. Specifically, clicking on one component 𝑐 in the component area
leads to emphasize the corresponding cube cells (i.e., those that map to 𝑐 via function 𝜇) both
�Figure 2: The visualization obtained for the intention in Example 7


in the table area and in the chart area. The highlight is the top component in the list and is
selected by default. Following the details-on-demand paradigm [6], interaction is enhanced
using a tooltip that, when the mouse is positioned on a data point, shows its coordinate, its
measure value(s), and the component(s) it belongs to.

Example 7. Figure 2 shows the visualization obtained when the following intention is formulated:
with SALES describe storeCost by month, category. On the top-left, the table area; on the right,
the chart area; on the bottom-left, the component area. Here a heat map and a bubble chart have
been selected. The top-interestingness component is a cluster, so a color has been assigned to each
component of clustering (i.e., to each cluster) and is uniformly used in all three areas. The highlight
(in green) is currently selected and is emphasized using a thicker border in all areas. A tooltip with
all the details about a single cell is also shown (in yellow).                                       □


6. Related work and conclusion
The idea of coupling data and analytical models was born in the 90’s with inductive databases,
where data were coupled with patterns meant as generalizations of the data. Later on, data-to-
model unification was addressed in MauveDB [7], which provides a language for specifying
model-based views of data using common statistical models.
�   The coupling of the OLAP paradigm and data mining to create an approach where concise
patterns are extracted from multidimensional data for user’s evaluation, was the goal of some
approaches commonly labeled as OLAM [8]. In this context, k-means clustering is used by [9] to
dynamically create semantically-rich aggregates of facts other than those statically provided by
dimension hierarchies. Similarly, the shrink operator is proposed by [10] to compute small-size
approximations of a cube via agglomerative clustering. Other operators that enrich data with
knowledge extraction results are DIFF [11], which returns a set of tuples that most successfully
describe the difference of values between two cells of a cube, and RELAX [12], which verifies
whether a pattern observed at a certain level of detail is also present at a coarser level of detail,
too. Finally, [13] reuse the OLAP paradigm to explore prediction cubes, i.e., cubes where each
cell summarizes a predictive model trained on the data corresponding to that cell. The IAM
approach can be regarded as OLAM since, like the approaches mentioned above, it relies on
mining techniques to enhance the cube resulting from an OLAP query. However, while each of
the approaches above uses one single technique (e.g., clustering) to this end, the IAM leans on
multiple mining techniques to give users a wider variety of insights, using the interestingness
measure to select the most relevant ones.
   To the best of our knowledge, though some tools (e.g., Spotfire and Tableau) integrate OLAP
and analytics capabilities in the same environment, none of them allows users to formulate
queries at a higher level of abstraction than OLAP (as done in the IAM using intentions), nor they
support the automated out-of-the-box enrichment of cubes with insights obtained by analytics
(as done in the IAM through enhanced cubes).
   In this paper we have given a proof-of-concept for the IAM vision by delivering an imple-
mentation of the describe operator, relying on a visual metaphor to display enhanced cubes.
Our implementation uses a simple multidimensional engine [14, 15] that relies on the Oracle
11g DBMS to execute queries on a star schema; the mining models are imported from the
Scikit-Learn Python library. The web-based visualization is implemented in JavaScript and uses
the D3 library. The prototype can be accessed at http://semantic.csr.unibo.it/describe/.
   In [2], we have showed that our approach diminishes the effort for formulating complex
analyses while ensuring that performances are compatible with near-real-time requirements of
interactive sessions. Specifically, using the ASCII character length as an approximation for the
effort it takes to craft a query, we evaluated the saving in user’s effort when writing a describe
intention over the one necessary to obtain the same result using plain SQL and Python. We
considered a simple session including three intentions, where the by clause is progressively
enlarged and all the models are computed. Remarkably, it turned out that the total formulation
effort using SQL+Python is about two orders of magnitude larger than using describe intentions
(in the average, about 5400 vs. 55 chars). For the efficiency test we used the FoodMart data
(github.com/julianhyde/foodmart-data-mysql) and the same session mentioned above. Table 1
shows the total execution time and its breakdown into the times necessary to query the base
cube, to compute the models, to measure the interestingness, and to generate the pivot table
returned to the browser. Remarkably, it turns out that at most 18 seconds are necessary to
retrieve and visualize an enhanced cube of more than 86000 cells, which is perfectly compatible
with the execution time of a standard OLAP query.
   The main directions for future research we wish to pursue are: (i) evaluate the usability of
the approach by conducting tests with real users, and (ii) extend the approach to operate with
�Table 1
Execution times in seconds for three intentions with increasing cardinalities of 𝐶 (the tests were run
on an Intel Core(TM)i7-6700 CPU@3.40GHz with 8GB RAM)
                     Intention    |𝐶|    Query   Model   Interestingness   Pivot   Total
                        𝐼1        323     0.10    0.25         0.00        0.00     0.36
                        𝐼2       20525    0.22    5.90         0.36        0.36     6.83
                        𝐼3       86832    0.22    8.50         7.43        1.72    17.87



dashboards of enhanced cubes.


References
 [1] P. Vassiliadis, P. Marcel, S. Rizzi, Beyond roll-up’s and drill-down’s: An intentional analytics
     model to reinvent OLAP, Inf. Sys. 85 (2019) 68–91.
 [2] M. Francia, P. Marcel, V. Peralta, S. Rizzi, Enhancing cubes with models to describe
     multidimensional data, Inf. Sys. Frontiers 24 (2022) 31–48.
 [3] M. Francia, M. Golfarelli, P. Marcel, S. Rizzi, P. Vassiliadis, Assess queries for interactive
     analysis of data cubes, in: Proc. of EDBT, 2021, pp. 121–132.
 [4] M. Terrovitis, P. Vassiliadis, S. Skiadopoulos, E. Bertino, B. Catania, A. Maddalena, S. Rizzi,
     Modeling and language support for the management of pattern-bases, Data Knowl. Eng.
     62 (2007) 368–397.
 [5] P. Marcel, V. Peralta, P. Vassiliadis, A framework for learning cell interestingness from
     cube explorations, in: Proc. of ADBIS, 2019.
 [6] B. Shneiderman, The eyes have it: A task by data type taxonomy for information visual-
     izations, in: Proc. of IEEE Symp. on Visual Languages, 1996, pp. 336–343.
 [7] A. Deshpande, S. Madden, MauveDB: supporting model-based user views in database
     systems, in: Proc. of SIGMOD, 2006, pp. 73–84.
 [8] J. Han, OLAP mining: Integration of OLAP with data mining, in: Proc. of Working Conf.
     on Database Semantics, 1997, pp. 3–20.
 [9] F. Bentayeb, C. Favre, RoK: Roll-up with the k-means clustering method for recommending
     OLAP queries, in: Proc. of DEXA, 2009, pp. 501–515.
[10] M. Golfarelli, S. Graziani, S. Rizzi, Shrink: An OLAP operation for balancing precision and
     size of pivot tables, Data Knowl. Eng. 93 (2014) 19–41.
[11] S. Sarawagi, Explaining differences in multidimensional aggregates, in: Proc. of VLDB,
     1999, pp. 42–53.
[12] G. Sathe, S. Sarawagi, Intelligent rollups in multidimensional OLAP data, in: Proc. of
     VLDB, 2001, pp. 531–540.
[13] B. Chen, L. Chen, Y. Lin, R. Ramakrishnan, Prediction cubes, in: Proc. of VLDB, 2005, pp.
     982–993.
[14] M. Francia, E. Gallinucci, M. Golfarelli, Towards conversational OLAP, in: Proc. of DOLAP,
     2020, pp. 6–15.
[15] M. Francia, E. Gallinucci, M. Golfarelli, COOL: A framework for conversational OLAP, Inf.
     Syst. 104 (2022) 101752.
�