Vol-3194/paper59


Wolfgang Fahl

Paper

Paper
edit
description  
id  Vol-3194/paper59
wikidataid  Q117344892→Q117344892
title  Inferring User Decision-Making Processes in Recommender Systems with Knowledge Graphs
pdfUrl  https://ceur-ws.org/Vol-3194/paper59.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/AnelliNSFM22
volume  Vol-3194→Vol-3194
session  →

Paper[edit]

Paper
edit
description  
id  Vol-3194/paper59
wikidataid  Q117344892→Q117344892
title  Inferring User Decision-Making Processes in Recommender Systems with Knowledge Graphs
pdfUrl  https://ceur-ws.org/Vol-3194/paper59.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/AnelliNSFM22
volume  Vol-3194→Vol-3194
session  →

Inferring User Decision-Making Processes in Recommender Systems with Knowledge Graphs[edit]

load PDF

Inferring User Decision-Making Processes
in Recommender Systems with Knowledge Graphs
Discussion Paper

Vito Walter Anelli1 , Tommaso Di Noia1 ,
Eugenio Di Sciascio1 , Antonio Ferrara1 and Alberto Carlo Maria Mancino1
1
    Polytechnic University of Bari, via Orabona, 4, 70125 Bari, Italy


                                         Abstract
                                         This paper proposes a sparse factorization approach, KGFlex, that represents each item feature as an em-
                                         bedding. With KGFlex, the user-item interactions are a factorized combination of the item features relevant
                                         to the user. An entropy-driven module drives the training considering only the feature involved in the
                                         user’s decision-making process. Extensive experiments confirm the approach’s effectiveness, considering
                                         the ranking accuracy, diversity, and induced bias. The public implementation of KGFlex is available at
                                         https://split.to/kgflex.

                                         Keywords
                                         Recommender systems, Information Retrieval, feature factorization, entropy, knowledge graphs




1. Introduction
At least once, everyone asked a friend to suggest a movie to watch. This natural and yet effective
behavior of obtaining recommendations from similar people is the foundation of Collaborative
Filtering (CF) recommendation techniques. In fact, it is their remarkable accuracy that helped
Recommender Systems getting famous. Popular examples of CF are Matrix Factorization [1],
Nearest Neighbors, and the recent Deep Learning models [2]. Despite the notable performance,
these techniques have some severe limitations: they need a considerable number of transactions
to work, and they fail to suggest fresh items. On the other side of the coin, there are Content-
Based Filtering (CBF) methods. At the cost of lower accuracy, not only are they interpretable,
but they also succeed in suggesting fresh items. However, these algorithms are affected by the
overspecialization problem, and they struggle to suggest items different from the user history.
To benefit from both approaches and alleviate the drawbacks, researchers integrated into Col-
laborative Filtering the side information used in content-based approaches such as tags [3],

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
$ vitowalter.anelli@poliba.it (V. W. Anelli); tommaso.dinoia@poliba.it (T. Di Noia); eugenio.disciascio@poliba.it
(E. Di Sciascio); antonio.ferrara@poliba.it (A. Ferrara); alberto.mancino@poliba.it (A. C. M. Mancino)
€ https://sisinflab.poliba.it/people/vito-walter-anelli/ (V. W. Anelli);
https://sisinflab.poliba.it/people/tommaso-di-noia/ (T. Di Noia);
https://sisinflab.poliba.it/people/eugenio-di-sciascio/ (E. Di Sciascio);
https://sisinflab.poliba.it/people/antonio-ferrara/ (A. Ferrara);
https://sisinflab.poliba.it/people/alberto-carlo-maria-mancino/ (A. C. M. Mancino)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
�demographic data [4], structured knowledge [5]. However, rarely do these models express
their full potential since they combine large and dense models with hundreds or thousands
of features, and they are computationally expensive. Among the various information sources,
Knowledge Graphs (𝒦𝒢𝑠) are gaining momentum. This work discusses KGFlex [6], a sparse
embedding model that extracts facts and knowledge from publicly available knowledge graphs
to describe the catalog items. The underlying factorization model shapes the user interactions
with item features through low-dimensionality embeddings. KGFlex examines the user-specific
decision-making process of consuming or not consuming an item to weight feature embeddings
employing an entropy-based strategy. Consequently, the user profile only comprises an individual
representation of each relevant feature. To evaluate the efficacy of KGFlex, we conduct extensive
experiments on two different publicly available datasets extracting content information from
DBpedia12 . The results show that KGFlex has competitive accuracy performance, and at the same
time, generates highly diversified recommendations with a low induced bias.


2. Knowledge Graph Sparse Embeddings
KGFlex exploits the knowledge encoded in a knowledge graph as side information to characterize
both items and users. One of the main assumptions is that users decide to enjoy an item based
on a subset of its characteristics, implying that not all the item features are equally important.
In the following, we show how KGFlex describes each user and item with a set of features. Taking
a cue from information theory KGFlex exploits the notion of information gain to measure the
relevance of a feature for a user in deciding to consume or not an item.

2.1. From Knowledge Graphs to Decision-Making
In KGFlex, items and users are characterized by sets of features extracted from a 𝒦𝒢, a knowledge
base of semantically linked machine-understandable data [7]. During the years, 𝒦𝒢𝑠 have gained
more and more success thanks to the Linked Data initiative [8]. Today we can benefit from 1,527
different 𝒦𝒢𝑠 connected in the so-called Linked Open Data Cloud3 .
   A knowledge graph 𝒦𝒢 can be represented as pairs of entities linked to each other by binary
                                                    𝜌
relations. Each connection in 𝒦𝒢 is denoted by 𝜎 − → 𝜔, where 𝜎 is a subject entity, 𝜌 is a directed
relation (predicate), and 𝜔 is an object entity. Hereinafter, we generalize the previous notion to
multi-hop predicates (i.e., considering chains of predicates that connect two entities at a higher
                                                                   𝜌1     𝜌2     𝜌𝑛
depth). Let 𝑛-hop predicate be defined as 𝜌 = ⟨𝜌1 ,...,𝜌𝑛 ⟩ if 𝜎 −→ 𝜔1 −→ ... −→ 𝜔𝑛 ∈ 𝒦𝒢. For
                                 𝜌
convenience, ℎ(𝜌) = 𝑛 for 𝜌 : 𝜎 −→ 𝜔𝑛 ∈ 𝒦𝒢 denotes the depth of the predicate chain, and when
                                    𝜌
no confusion arises we will use 𝜎 −→ 𝜔 to denote a generic chain with ℎ(𝜌) ≥ 1.

2.1.1. Extraction of Item and User Features from Knowledge Graph
Given a collection of items ℐ and a knowledge graph 𝒦𝒢, we assume each element in 𝑖 ∈ ℐ has
a mapping to a corresponding entity in 𝒦𝒢. Under this assumption, an item 𝑖 can be explored,
   1
     http://dbpedia.org
   2
     https://github.com/sisinflab/LinkedDatasets
   3
     https://lod-cloud.net/datasets
�                                   (𝑛)
at depth 𝑛, to identify the set ℱ𝑖       of the semantic features describing it:
                            (𝑛)                   𝜌
                          ℱ𝑖      = {⟨𝜌, 𝜔⟩ | 𝑖 −
                                                → 𝜔 ∈ 𝒦𝒢 ,ℎ(𝜌) ∈ {1,...,𝑛}}.                     (1)

At this stage, feature filtering, graph pruning, and semantic feature selection techniques [9] can
be applied to control the computational and memory load and improve system performance.
  We describe each user 𝑢 ∈ 𝒰 with the set ℱ𝑢 of the features representing the items ℐ𝑢 ⊆ ℐ
enjoyed by 𝑢:                                     ⋃︁ (𝑛)
                                          ℱ𝑢(𝑛) =    ℱ𝑖 .                                       (2)
                                                      𝑖∈ℐ𝑢

  Finally, we define the overall set ℱ (𝑛) of features in the system:
                                                  ⋃︁ (𝑛)
                                          ℱ (𝑛) = ℱ𝑖 .                                           (3)
                                                      𝑖∈ℐ

In the following, for convenience, the (𝑛) superscript is omitted whenever it is not relevant in
the context.

2.1.2. Information Gain of User Features
In information theory, entropy is used to measure the uncertainty of a random variable.
   In particular, the entropy 𝐻(𝑉 ) of a random variable 𝑉 with 𝑘 possible values in {𝑣1 ,...,𝑣𝑘 }
is defined as:
                                       ∑︁𝑘
                             𝐻(𝑉 ) = − 𝑃 (𝑉 = 𝑣𝑖 )log2 𝑃 (𝑉 = 𝑣𝑖 ).                            (4)
                                            𝑖=1
It is straightforward to check that a coin that always comes up heads has zero entropy, while a fair
coin equally likely to come up heads or tails when flipped has entropy 1. Notably, if 𝐵𝑞 is a binary
random variable that is true with probability 𝑞, we have 𝐻(𝐵𝑞 ) = −𝑞log2 𝑞−(1−𝑞)log2 (1−𝑞).
For instance, given a dataset 𝒟 of training samples in the form (x,𝑦), with x ∈ R𝐹 and 𝑦 ∈ {0,1} is
a binary variable that is true with probability 𝑞, its entropy is 𝐻(𝒟) = 𝐻(𝑦𝑞 ). As a consequence,
a dataset with a balanced number of positive and negative samples has entropy 1.
    In this context, the information gain 𝐼𝐺(𝒟,𝑥𝑑 ) measures the expected reduction in information
entropy obtained from the observation of one of the attributes 𝑥𝑑 in x:

                                     𝐼𝐺(𝒟,𝑥𝑑 ) = 𝐻(𝒟)−𝐻(𝒟|𝑥𝑑 ),                                  (5)

where 𝐻(𝒟|𝑥𝑑 ) is the expected entropy of 𝒟 conditioned on 𝑥𝑑 . If 𝑥𝑑 can assume 𝑘 distinct
values 𝑥𝑑,1 ,...𝑥𝑑,𝑘 with a categorical probability distribution,the dataset 𝒟 is partitioned into 𝑘
mutually exclusive subsets and the following conditioned entropy is determined:
                                            𝑘
                                           ∑︁
                           𝐻(𝒟|𝑥𝑑 ) =         𝑃 (𝑥𝑑 = 𝑥𝑑,𝑖 )𝐻(𝒟|𝑥𝑑 = 𝑥𝑑,𝑖 ).                     (6)
                                           𝑖=1

   Since the information gain defined in Eq. (5) returns a measure of the importance of a single
attribute in distinguishing positive from negative examples in a dataset, we build, for each user 𝑢,
�a balanced dataset 𝒟𝑢 with ⋃︀
                           all the consumed items from ℐ𝑢 and the same amount of negative items
randomly picked up from 𝑣∈𝒰 ,𝑣̸=𝑢 ℐ𝑣 ∖ℐ𝑢 . Each sample in 𝒟𝑢 is provided with a set of binary
variables corresponding to the features in ℱ𝑢 . Each variable indicates, for each item 𝑖 in 𝒟𝑢 , the
presence (𝑓 = 1) or the absence (𝑓 = 0) of the corresponding feature in the set ℱ𝑖 . According
to the definition, 𝐻(𝒟𝑢 ) = 1. Therefore, the information gain for each feature 𝑓 ∈ ℱ𝑢 can be
computed using the dataset 𝒟𝑢 . Let 𝑝𝑢𝑓 be the number of positive samples in 𝒟𝑢 for which 𝑓 = 1,
𝑛𝑢𝑓 the number of negative samples for which the same feature is present, and 𝑡𝑢𝑓 = 𝑝𝑢𝑓 +𝑛𝑢𝑓 .
Analogously, 𝑝𝑢¬𝑓 = |ℐ𝑢 |−𝑝𝑢𝑓 is the number of positive samples with 𝑓 = 0, 𝑛𝑢¬𝑓 = |ℐ𝑢 |−𝑛𝑢𝑓
is the number of negative samples with 𝑓 = 0, and 𝑡𝑢¬𝑓 = 𝑝𝑢¬𝑓 +𝑛𝑢¬𝑓 . Following Eqq. (5) and (6):

                        𝐼𝐺(𝒟𝑢 ,𝑓 ) = 1−𝐻(𝒟𝑢 |𝑓 = 1)−𝐻(𝒟𝑢 |𝑓 = 0),                                 (7)
                                          (︂                             )︂
                                    𝑡𝑢𝑓        𝑝𝑢𝑓      𝑝𝑢𝑓 𝑛𝑢𝑓      𝑛𝑢𝑓
                   𝐻(𝒟𝑢 |𝑓 = 1) =            −     log2    −    log2        ,                     (8)
                                   |𝒟𝑢 |       𝑡𝑢𝑓      𝑡𝑢𝑓 𝑡𝑢𝑓      𝑡𝑢𝑓
                                       (︂                                     )︂
                                 𝑡𝑢¬𝑓       𝑝𝑢¬𝑓       𝑝𝑢¬𝑓 𝑛𝑢¬𝑓      𝑛𝑢¬𝑓
                  𝐻(𝒟𝑢 |𝑓 = 0) =          −       log2     −     log2            .                (9)
                                 |𝒟𝑢 |       𝑡𝑢¬𝑓      𝑡𝑢¬𝑓 𝑡𝑢¬𝑓      𝑡𝑢¬𝑓
  We finally associate a weight 𝑘𝑢𝑓 = 𝐼𝐺(𝒟𝑢 ,𝑓 ) to each pair of user 𝑢 and feature 𝑓 to represent
the influence of a feature —in the view of the user— in the prediction of user-item interactions.

2.2. Sparse Interaction of Feature Embeddings for Prediction
KGFlex models the features in ℱ as collaboratively learned embeddings in a latent space. Since
KGFlex promotes the idea of having user fine-tuned versions of the same model, we have both
a global representation of the features in ℱ and a personal view, for each user 𝑢, of the fea-
tures in ℱ𝑢 ⊆ ℱ. Notably, the model is structured into two distinct parts. On the one hand,
KGFlex keeps a set 𝒢 of global trainable embeddings and biases shared among all the users, with
𝒢 = {(g𝑓 ∈ R𝐸 ,𝑏𝑓 ∈ R), ∀𝑓 ∈ ℱ}. On the other hand, each user in KGFlex also has his/her
personal representation of the features he/she interacted with, i.e., the features in ℱ𝑢 . These
embeddings are collected within the set 𝒫 𝑢 , defined as 𝒫 𝑢 = {p𝑢𝑓 ∈ R𝐸 , ∀𝑓 ∈ ℱ𝑢 }. Then, the
inner product between the personal representation p𝑓𝑢 and the global representation g𝑓 , plus a
bias value 𝑏𝑓 , estimates the affinity of user 𝑢 to feature 𝑓 . The sum of such affinities for all the
features in ℱ𝑢𝑖 = ℱ𝑢 ∩ℱ𝑖 , weighted according to the pre-computed entropy-based coefficients,
estimates the interaction 𝑥^ 𝑢𝑖 between user 𝑢 and item 𝑖:
                                            ∑︁
                                    𝑥
                                    ^ 𝑢𝑖 =      𝑘𝑢𝑓 (p𝑢𝑓 g𝑓 +𝑏𝑓 ).                                (10)
                                          𝑓 ∈ℱ𝑢𝑖

Eq. (10) encodes the strategy KGFlex exploits to handle thousands of model features. In fact,
it takes advantage of user profile to involve only a small subset of them in the estimate of the
user-item affinity.
    To learn the model parameters, KGFlex adopts Bayesian Personalized Ranking (BPR), the
most common pair-wise Learning to Rank strategy, based on a maximum posterior estimator.
Given∑︀a training set 𝒯 = {(𝑢,𝑖 ,𝑖 ) | 𝑖 ∈ ℐ𝑢 ∧ 𝑖 ∈ ℐ∖ℐ𝑢 ,∀𝑢 ∈ 𝒰}, BPR optimizes the loss+
                                 + −       +         −

𝐿 = (𝑢,𝑖+ ,𝑖− )∈𝒯 ln𝜎(𝑥       ^ 𝑢𝑖− ), with the assumption that a user 𝑢 prefers a consumed item 𝑖
                       ^ 𝑢𝑖+ −𝑥
�over a non-consumed item 𝑖− . The model parameters are learnt with an optimization algorithm
like SGD, as described in Rendle et al. [10].⎧To that aim, we derive:
                                            ⎪
                                            ⎪
                                            ⎪  𝑘𝑢𝑓 g𝑓 if 𝜃 = p𝑢𝑓 ,
                                  𝜕         ⎨𝑘 p𝑢 if 𝜃 = g ,
                                            ⎪
                                                𝑢𝑓 𝑓            𝑓
                                     𝑥
                                     ^ 𝑢𝑖 =                                              (11)
                                 𝜕𝜃         ⎪
                                            ⎪
                                            ⎪  𝑘𝑢𝑓      if 𝜃 = 𝑏𝑓 ,
                                                        else.
                                            ⎪
                                            ⎩0


3. Experimental Setup
In the following, we present a discussion about datasets, preprocessing, baselines, and evaluation
protocol that guided the experimentation of KGFlex.

3.1. Datasets and Filtering
The evaluation of the performance of KGFlex is conducted on two well-known datasets: Yahoo!
Movies and Facebook Books. The datasets have been binarized, retaining ratings of 3 or higher.
To ensure a fair comparison with the baselines, an iterative 10-core, and 5-core preprocessing
procedure is performed on the first and second datasets. The items have been described with a
set of semantic features retrieved through a KG exploration at depth 2 of the DBpedia 𝒦𝒢 using
a public DBpedia URI mapping. Some features (based on their 1-hop predicate) have not been
considered since they do not provide useful information [9]: dbo:wikiPageWikiLink, owl:sameAs,
rdf:type, gold:hypernym, rdfs:seeAlso, dbp:wordnet_type, dbo:wikiPageExternalLink, dbo:thumbnail,
prov:wasDerivedFrom, and dbp:wikiPageUsesTemplate. Therefore, we removed the features as-
sociated with less than ten items. Finally, we kept the user’s 100 most informative features from
the 1- and 2- hop exploration to reduce the computational costs.

3.2. Baselines, Evaluation Protocol and Metrics
To assess the effectiveness of KGFlex, we compare it with BPR-MF [10], a latent factor model based
on the same pair-wise optimization criterion used in KGFlex, a batch version of Rendle et al. [11]
MF, NeuMF [12], and kaHFM [5], a factorization-based model making use of knowledge graphs.
We have chosen the all unrated items protocol using the hold-out 80-20 splitting strategy, which
considers as candidate items all the items not rated by the user. All the models have been tested in
10 different configurations of hyperparameters with the Bayesian hyperparameter optimization
search. For the sake of reproducibility, we provide our code and a working configuration file
for the Elliot framework [13, 14]. Our goal is to assess the accuracy and beyond-accuracy
performance of KGFlex, along with its fairness properties with respect to the popularity of
the items. In detail, we have measured the recommendation accuracy with nDCG [11], also
used as the validation metric. Then, we have evaluated the diversity, adopting Item Coverage
(IC) [15] and Gini Index (Gini) [16]. Finally, three bias metrics have been used to evaluate how
the algorithms consider the items from the long-tail: ACLT [17], PopREO and PopRSP, specific
applications of RSP, and REO [18]. PopREO estimates the equal opportunity of items, encouraging
the True Positive Rate of popular and unpopular items to be the same. PopRSP measures statistical
parity, assessing whether the ranking probability distributions for popular and unpopular items
are the same in the recommendation.
�Table 1
Comparison of KGFlex with baselines (in boldface) and other reference algorithms. Among the baselines,
the best result is in boldface, the second-best result is underlined. For all the metrics, the cutoff is 10.
                                            a) Yahoo! Movies
                                 nDCG        IC      Gini    ACLT PopREO PopRSP
                Random        0.00960 1050 0.84956 5.52026              0.09847 0.00980
                Most Popular 0.15850    49 0.01263 0.00000              1.00000 1.00000
                VSM           0.04777 370 0.05245 3.11768               0.49588 0.45751
                Item-kNN      0.30739 745 0.15826 0.98842               0.70585 0.83466
                MultiVAE      0.23696 399 0.09136 0.23473               0.85433 0.96127
                BPR-MF        0.18571 151 0.02191 0.00064               0.99543 0.99989
                MF            0.28971 455 0.09024 0.08232               0.87345 0.98645
                NeuMF         0.09184   50 0.01134 0.00064              1.00000 0.99989
                kaHFM        0.30055 757 0.16591 0.46238                0.76103 0.92339
                KGFlex        0.24640 851 0.28015 2.14469              0.44768 0.63355
                                          b) Facebook Books
                                  nDCG      IC      Gini     ACLT PopREO PopRSP
                Random        0.00690 782 0.86167 5.26045              0.09794 0.00749
                Most Popular 0.09393 16 0.01265 0.00000                1.00000 1.00000
                VSM           0.03617 523 0.18874 3.80558              0.22616 0.29958
                Item-kNN      0.12903 769 0.37520 2.22524              0.48852 0.59861
                MultiVAE      0.11914 620 0.18279 0.46368              0.77695 0.91818
                BPR-MF        0.09473 17 0.01318 0.00000               1.00000 1.00000
                MF            0.09557 87 0.02376 0.00000               1.00000 1.00000
                NeuMF         0.07142 17 0.01245 0.00000               1.00000 1.00000
                kaHFM        0.12667 540 0.13866 0.32942               0.87663 0.94197
                KGFlex        0.08526 606 0.30703 3.02641             0.15210 0.44852

3.3. Results
3.3.1. Evaluating the Overall Performance
Table 1a depicts the evaluation outcome for the aforementioned metrics with a cutoff of 10 on the
Yahoo! Movies dataset. KGFlex shows satisfactory accuracy results, being outperformed only by
kaHFM and MF. It is noteworthy that KGFlex significantly outperforms BPR-MF, although both
are learned with a pair-wise BPR optimization, thus underlining the beneficial role of the extracted
knowledge. Moreover, examining the item coverage and Gini values, we note the high degree of
personalization provided by KGFlex. We link this result to the personalized view of the knowledge
granted by the framework. Moreover, in KGFlex the collaborative signal on explicit user interests
ensures to recommend diverse items among the ones sharing characteristics of interest for the
user. The aforementioned behavior is not confirmed in Facebook Books (see Table 1b). Indeed, the
accuracy results seem to remain below the performance of other factorization-based approaches.
However, the diversity results show how BPR-MF, MF, and NeuMF may have been flooded by
popularity signal, which led them to perform poorly regarding the item coverage and Gini metrics.
Instead, KGFlex does not suffer from this problem and approaches the superior performance of
Item-kNN in terms of diversity.
�                       a) Yahoo! Movies                              b) Facebook Books
                                                          0.4
            0.6                                           0.3
    HR@10




                                                  HR@10
            0.4                                           0.2
            0.2                                           0.1
              0                                             0
                  4      6           8       10                  4        6         8         10
                             SE@10                                        SE@10
              Random     Most Popular     VSM             Item-kNN    MultiVAE
              BPR-MF     MF               NeuMF           kaHFM       KGFlex
Figure 1: Accuracy vs. distributional diversity. The plots show the value of HR@10 against SE@10: the
closer to the top-right corner the better.

3.3.2. Investigating the Accuracy/Personalization Trade-Off
What we have analytically observed is substantially confirmed in Figure 1. These graphs show
the joint behavior of KGFlex on accuracy and distributional diversity by analyzing the value
of Hit Ratio (HR) with respect to the Shannon Entropy (SE) statistics, which measures how di-
versely distributed are the recommended items. Among factorization-based approaches, KGFlex
approaches the right-top margin to a greater extent, remarking its capability of providing highly
personalized recommendations, probably due to the joint operation of the global and the personal
views of the same features. The kaHFM model usually is the second-best model, while the other
approaches seem to perform very poorly in at least one dimension or do not have a stable position
when varying the dataset.

3.3.3. Analyzing the Algorithmic and Popularity Bias
Oftentimes, unpopular items are not recommended and remain underrepresented [19], thus
causing a fairness issue for items and an inappropriate recommendation for users who do not prefer
very popular items. While there is a wide range of approaches in literature aiming to reduce the
recommendation biases [20, 21, 22, 23, 24, 25, 26] , we study whether KGFlex is inherently resilient
to algorithmic bias. From Tables 1a and 1b, it is noteworthy that KGFlex always outperforms all the
other factorization-based approaches and generally outperforms the other approaches. The value
of ACLT (the higher the better) is comparable with the value obtained by VSM. This result is further
supported by the values of PopREO and PopRSP (the smaller the better). Concerning those metrics,
KGFlex and VSM continue to grant the less biased recommendations. Interestingly, while both
exploit the same optimization criterion, we notice how KGFlex consistently improves BPR-MF,
which is known to be vulnerable to imbalanced data and to produce biased recommendations [18].

4. Conclusion
We introduced KGFlex, a promising knowledge-aware approach to generate recommendations
from implicit feedback. KGFlex has demonstrated its ability to take the best from content-based
and factorization-based recommendation approaches. Moreover, thanks to the high degree of
expressiveness provided by the personalized representation of content information, KGFlex
guarantees satisfactory and diverse recommendations and resilience to algorithmic bias.
�References
 [1] Y. Koren, R. M. Bell, C. Volinsky, Matrix factorization techniques for recommender systems,
     IEEE Computer 42 (2009) 30–37.
 [2] S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: A survey and new
     perspectives, ACM Comput. Surv. 52 (2019) 5:1–5:38. URL: https://doi.org/10.1145/3285029.
     doi:10.1145/3285029.
 [3] Y. Zhu, Z. Guan, S. Tan, H. Liu, D. Cai, X. He, Heterogeneous hypergraph embedding for
     document recommendation, Neurocomputing 216 (2016) 150–162.
 [4] W. X. Zhao, S. Li, Y. He, L. Wang, J. Wen, X. Li, Exploring demographic information in social
     media for product recommendation, Knowl. Inf. Syst. 49 (2016) 61–89.
 [5] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, How to make latent factors
     interpretable by feeding factorization machines with knowledge graphs, in: ISWC (1),
     volume 11778 of Lecture Notes in Computer Science, Springer, 2019, pp. 38–56.
 [6] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ferrara, A. C. M. Mancino, Sparse feature factoriza-
     tion for recommender systems with knowledge graphs, in: RecSys, ACM, 2021, pp. 154–165.
 [7] T. Berners-Lee, J. Hendler, O. Lassila,             The semantic web,         Scientific Amer-
     ican 284 (2001) 34–43. URL: http://www.sciam.com/article.cfm?articleID=
     00048144-10D2-1C70-84A9809EC588EF21.
 [8] T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Synthesis
     Lectures on the Semantic Web, Morgan & Claypool Publishers, 2011. URL: https://doi.org/
     10.2200/S00334ED1V01Y201102WBE001. doi:10.2200/S00334ED1V01Y201102WBE001.
 [9] T. D. Noia, C. Magarelli, A. Maurino, M. Palmonari, A. Rula, Using ontology-based data
     summarization to develop semantics-aware recommender systems, in: ESWC, volume
     10843 of Lecture Notes in Computer Science, Springer, 2018, pp. 128–144.
[10] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: bayesian personalized
     ranking from implicit feedback, in: UAI, AUAI Press, 2009, pp. 452–461.
[11] S. Rendle, W. Krichene, L. Zhang, J. R. Anderson, Neural collaborative filtering vs. matrix
     factorization revisited, in: RecSys, ACM, 2020, pp. 240–248.
[12] X. He, T. Chua, Neural factorization machines for sparse predictive analytics, in: SIGIR,
     ACM, 2017, pp. 355–364.
[13] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D.
     Noia, Elliot: A comprehensive and rigorous framework for reproducible recommender
     systems evaluation, in: SIGIR, ACM, 2021, pp. 2405–2414.
[14] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D.
     Noia, V-elliot: Design, evaluate and tune visual recommender systems, in: RecSys, ACM,
     2021, pp. 768–771.
[15] G. Adomavicius, Y. Kwon, Improving aggregate recommendation diversity using
     ranking-based techniques, IEEE Trans. Knowl. Data Eng. 24 (2012) 896–911.
[16] P. Castells, N. J. Hurley, S. Vargas, Novelty and diversity in recommender systems, in:
     F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook, Springer, 2015,
     pp. 881–918. URL: https://doi.org/10.1007/978-1-4899-7637-6_26.
[17] H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender systems
     with personalized re-ranking, in: FLAIRS Conference, AAAI Press, 2019, pp. 413–418.
�[18] Z. Zhu, J. Wang, J. Caverlee, Measuring and mitigating item under-recommendation bias
     in personalized ranking systems, in: SIGIR, ACM, 2020, pp. 449–458.
[19] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity
     bias in recommendation, in: RMSE@RecSys, volume 2440 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2019.
[20] C. Zhou, J. Ma, J. Zhang, J. Zhou, H. Yang, Contrastive learning for debiased candidate
     generation in large-scale recommender systems, in: KDD, ACM, 2021, pp. 3985–3995.
[21] R. Borges, K. Stefanidis, On mitigating popularity bias in recommendations via variational
     autoencoders, in: SAC, ACM, 2021, pp. 1383–1389.
[22] T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, T. Joachims, Recommendations as
     treatments: Debiasing learning and evaluation, in: ICML, volume 48 of JMLR Workshop
     and Conference Proceedings, JMLR.org, 2016, pp. 1670–1679.
[23] D. Jannach, L. Lerche, I. Kamehkhosh, M. Jugovac, What recommenders recommend: an
     analysis of recommendation biases and possible countermeasures, User Model. User Adapt.
     Interact. 25 (2015) 427–491.
[24] M. D. Ekstrand, M. Tian, I. M. Azpiazu, J. D. Ekstrand, O. Anuyah, D. McNeill, M. S. Pera,
     All the cool kids, how do they fit in?: Popularity and demographic biases in recommender
     evaluation and effectiveness, in: FAT, volume 81 of Proceedings of Machine Learning
     Research, PMLR, 2018, pp. 172–186.
[25] L. Boratto, G. Fenu, M. Marras, Connecting user and item perspectives in popularity
     debiasing for collaborative recommendation, Inf. Process. Manag. 58 (2021) 102387.
[26] Y. Zhang, F. Feng, X. He, T. Wei, C. Song, G. Ling, Y. Zhang, Causal intervention for
     leveraging popularity bias in recommendation, in: SIGIR, ACM, 2021, pp. 11–20.
�

Inferring User Decision-Making Processes in Recommender Systems with Knowledge Graphs[edit]

load PDF

Inferring User Decision-Making Processes
in Recommender Systems with Knowledge Graphs
Discussion Paper

Vito Walter Anelli1 , Tommaso Di Noia1 ,
Eugenio Di Sciascio1 , Antonio Ferrara1 and Alberto Carlo Maria Mancino1
1
    Polytechnic University of Bari, via Orabona, 4, 70125 Bari, Italy


                                         Abstract
                                         This paper proposes a sparse factorization approach, KGFlex, that represents each item feature as an em-
                                         bedding. With KGFlex, the user-item interactions are a factorized combination of the item features relevant
                                         to the user. An entropy-driven module drives the training considering only the feature involved in the
                                         user’s decision-making process. Extensive experiments confirm the approach’s effectiveness, considering
                                         the ranking accuracy, diversity, and induced bias. The public implementation of KGFlex is available at
                                         https://split.to/kgflex.

                                         Keywords
                                         Recommender systems, Information Retrieval, feature factorization, entropy, knowledge graphs




1. Introduction
At least once, everyone asked a friend to suggest a movie to watch. This natural and yet effective
behavior of obtaining recommendations from similar people is the foundation of Collaborative
Filtering (CF) recommendation techniques. In fact, it is their remarkable accuracy that helped
Recommender Systems getting famous. Popular examples of CF are Matrix Factorization [1],
Nearest Neighbors, and the recent Deep Learning models [2]. Despite the notable performance,
these techniques have some severe limitations: they need a considerable number of transactions
to work, and they fail to suggest fresh items. On the other side of the coin, there are Content-
Based Filtering (CBF) methods. At the cost of lower accuracy, not only are they interpretable,
but they also succeed in suggesting fresh items. However, these algorithms are affected by the
overspecialization problem, and they struggle to suggest items different from the user history.
To benefit from both approaches and alleviate the drawbacks, researchers integrated into Col-
laborative Filtering the side information used in content-based approaches such as tags [3],

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
$ vitowalter.anelli@poliba.it (V. W. Anelli); tommaso.dinoia@poliba.it (T. Di Noia); eugenio.disciascio@poliba.it
(E. Di Sciascio); antonio.ferrara@poliba.it (A. Ferrara); alberto.mancino@poliba.it (A. C. M. Mancino)
€ https://sisinflab.poliba.it/people/vito-walter-anelli/ (V. W. Anelli);
https://sisinflab.poliba.it/people/tommaso-di-noia/ (T. Di Noia);
https://sisinflab.poliba.it/people/eugenio-di-sciascio/ (E. Di Sciascio);
https://sisinflab.poliba.it/people/antonio-ferrara/ (A. Ferrara);
https://sisinflab.poliba.it/people/alberto-carlo-maria-mancino/ (A. C. M. Mancino)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
�demographic data [4], structured knowledge [5]. However, rarely do these models express
their full potential since they combine large and dense models with hundreds or thousands
of features, and they are computationally expensive. Among the various information sources,
Knowledge Graphs (𝒦𝒢𝑠) are gaining momentum. This work discusses KGFlex [6], a sparse
embedding model that extracts facts and knowledge from publicly available knowledge graphs
to describe the catalog items. The underlying factorization model shapes the user interactions
with item features through low-dimensionality embeddings. KGFlex examines the user-specific
decision-making process of consuming or not consuming an item to weight feature embeddings
employing an entropy-based strategy. Consequently, the user profile only comprises an individual
representation of each relevant feature. To evaluate the efficacy of KGFlex, we conduct extensive
experiments on two different publicly available datasets extracting content information from
DBpedia12 . The results show that KGFlex has competitive accuracy performance, and at the same
time, generates highly diversified recommendations with a low induced bias.


2. Knowledge Graph Sparse Embeddings
KGFlex exploits the knowledge encoded in a knowledge graph as side information to characterize
both items and users. One of the main assumptions is that users decide to enjoy an item based
on a subset of its characteristics, implying that not all the item features are equally important.
In the following, we show how KGFlex describes each user and item with a set of features. Taking
a cue from information theory KGFlex exploits the notion of information gain to measure the
relevance of a feature for a user in deciding to consume or not an item.

2.1. From Knowledge Graphs to Decision-Making
In KGFlex, items and users are characterized by sets of features extracted from a 𝒦𝒢, a knowledge
base of semantically linked machine-understandable data [7]. During the years, 𝒦𝒢𝑠 have gained
more and more success thanks to the Linked Data initiative [8]. Today we can benefit from 1,527
different 𝒦𝒢𝑠 connected in the so-called Linked Open Data Cloud3 .
   A knowledge graph 𝒦𝒢 can be represented as pairs of entities linked to each other by binary
                                                    𝜌
relations. Each connection in 𝒦𝒢 is denoted by 𝜎 − → 𝜔, where 𝜎 is a subject entity, 𝜌 is a directed
relation (predicate), and 𝜔 is an object entity. Hereinafter, we generalize the previous notion to
multi-hop predicates (i.e., considering chains of predicates that connect two entities at a higher
                                                                   𝜌1     𝜌2     𝜌𝑛
depth). Let 𝑛-hop predicate be defined as 𝜌 = ⟨𝜌1 ,...,𝜌𝑛 ⟩ if 𝜎 −→ 𝜔1 −→ ... −→ 𝜔𝑛 ∈ 𝒦𝒢. For
                                 𝜌
convenience, ℎ(𝜌) = 𝑛 for 𝜌 : 𝜎 −→ 𝜔𝑛 ∈ 𝒦𝒢 denotes the depth of the predicate chain, and when
                                    𝜌
no confusion arises we will use 𝜎 −→ 𝜔 to denote a generic chain with ℎ(𝜌) ≥ 1.

2.1.1. Extraction of Item and User Features from Knowledge Graph
Given a collection of items ℐ and a knowledge graph 𝒦𝒢, we assume each element in 𝑖 ∈ ℐ has
a mapping to a corresponding entity in 𝒦𝒢. Under this assumption, an item 𝑖 can be explored,
   1
     http://dbpedia.org
   2
     https://github.com/sisinflab/LinkedDatasets
   3
     https://lod-cloud.net/datasets
�                                   (𝑛)
at depth 𝑛, to identify the set ℱ𝑖       of the semantic features describing it:
                            (𝑛)                   𝜌
                          ℱ𝑖      = {⟨𝜌, 𝜔⟩ | 𝑖 −
                                                → 𝜔 ∈ 𝒦𝒢 ,ℎ(𝜌) ∈ {1,...,𝑛}}.                     (1)

At this stage, feature filtering, graph pruning, and semantic feature selection techniques [9] can
be applied to control the computational and memory load and improve system performance.
  We describe each user 𝑢 ∈ 𝒰 with the set ℱ𝑢 of the features representing the items ℐ𝑢 ⊆ ℐ
enjoyed by 𝑢:                                     ⋃︁ (𝑛)
                                          ℱ𝑢(𝑛) =    ℱ𝑖 .                                       (2)
                                                      𝑖∈ℐ𝑢

  Finally, we define the overall set ℱ (𝑛) of features in the system:
                                                  ⋃︁ (𝑛)
                                          ℱ (𝑛) = ℱ𝑖 .                                           (3)
                                                      𝑖∈ℐ

In the following, for convenience, the (𝑛) superscript is omitted whenever it is not relevant in
the context.

2.1.2. Information Gain of User Features
In information theory, entropy is used to measure the uncertainty of a random variable.
   In particular, the entropy 𝐻(𝑉 ) of a random variable 𝑉 with 𝑘 possible values in {𝑣1 ,...,𝑣𝑘 }
is defined as:
                                       ∑︁𝑘
                             𝐻(𝑉 ) = − 𝑃 (𝑉 = 𝑣𝑖 )log2 𝑃 (𝑉 = 𝑣𝑖 ).                            (4)
                                            𝑖=1
It is straightforward to check that a coin that always comes up heads has zero entropy, while a fair
coin equally likely to come up heads or tails when flipped has entropy 1. Notably, if 𝐵𝑞 is a binary
random variable that is true with probability 𝑞, we have 𝐻(𝐵𝑞 ) = −𝑞log2 𝑞−(1−𝑞)log2 (1−𝑞).
For instance, given a dataset 𝒟 of training samples in the form (x,𝑦), with x ∈ R𝐹 and 𝑦 ∈ {0,1} is
a binary variable that is true with probability 𝑞, its entropy is 𝐻(𝒟) = 𝐻(𝑦𝑞 ). As a consequence,
a dataset with a balanced number of positive and negative samples has entropy 1.
    In this context, the information gain 𝐼𝐺(𝒟,𝑥𝑑 ) measures the expected reduction in information
entropy obtained from the observation of one of the attributes 𝑥𝑑 in x:

                                     𝐼𝐺(𝒟,𝑥𝑑 ) = 𝐻(𝒟)−𝐻(𝒟|𝑥𝑑 ),                                  (5)

where 𝐻(𝒟|𝑥𝑑 ) is the expected entropy of 𝒟 conditioned on 𝑥𝑑 . If 𝑥𝑑 can assume 𝑘 distinct
values 𝑥𝑑,1 ,...𝑥𝑑,𝑘 with a categorical probability distribution,the dataset 𝒟 is partitioned into 𝑘
mutually exclusive subsets and the following conditioned entropy is determined:
                                            𝑘
                                           ∑︁
                           𝐻(𝒟|𝑥𝑑 ) =         𝑃 (𝑥𝑑 = 𝑥𝑑,𝑖 )𝐻(𝒟|𝑥𝑑 = 𝑥𝑑,𝑖 ).                     (6)
                                           𝑖=1

   Since the information gain defined in Eq. (5) returns a measure of the importance of a single
attribute in distinguishing positive from negative examples in a dataset, we build, for each user 𝑢,
�a balanced dataset 𝒟𝑢 with ⋃︀
                           all the consumed items from ℐ𝑢 and the same amount of negative items
randomly picked up from 𝑣∈𝒰 ,𝑣̸=𝑢 ℐ𝑣 ∖ℐ𝑢 . Each sample in 𝒟𝑢 is provided with a set of binary
variables corresponding to the features in ℱ𝑢 . Each variable indicates, for each item 𝑖 in 𝒟𝑢 , the
presence (𝑓 = 1) or the absence (𝑓 = 0) of the corresponding feature in the set ℱ𝑖 . According
to the definition, 𝐻(𝒟𝑢 ) = 1. Therefore, the information gain for each feature 𝑓 ∈ ℱ𝑢 can be
computed using the dataset 𝒟𝑢 . Let 𝑝𝑢𝑓 be the number of positive samples in 𝒟𝑢 for which 𝑓 = 1,
𝑛𝑢𝑓 the number of negative samples for which the same feature is present, and 𝑡𝑢𝑓 = 𝑝𝑢𝑓 +𝑛𝑢𝑓 .
Analogously, 𝑝𝑢¬𝑓 = |ℐ𝑢 |−𝑝𝑢𝑓 is the number of positive samples with 𝑓 = 0, 𝑛𝑢¬𝑓 = |ℐ𝑢 |−𝑛𝑢𝑓
is the number of negative samples with 𝑓 = 0, and 𝑡𝑢¬𝑓 = 𝑝𝑢¬𝑓 +𝑛𝑢¬𝑓 . Following Eqq. (5) and (6):

                        𝐼𝐺(𝒟𝑢 ,𝑓 ) = 1−𝐻(𝒟𝑢 |𝑓 = 1)−𝐻(𝒟𝑢 |𝑓 = 0),                                 (7)
                                          (︂                             )︂
                                    𝑡𝑢𝑓        𝑝𝑢𝑓      𝑝𝑢𝑓 𝑛𝑢𝑓      𝑛𝑢𝑓
                   𝐻(𝒟𝑢 |𝑓 = 1) =            −     log2    −    log2        ,                     (8)
                                   |𝒟𝑢 |       𝑡𝑢𝑓      𝑡𝑢𝑓 𝑡𝑢𝑓      𝑡𝑢𝑓
                                       (︂                                     )︂
                                 𝑡𝑢¬𝑓       𝑝𝑢¬𝑓       𝑝𝑢¬𝑓 𝑛𝑢¬𝑓      𝑛𝑢¬𝑓
                  𝐻(𝒟𝑢 |𝑓 = 0) =          −       log2     −     log2            .                (9)
                                 |𝒟𝑢 |       𝑡𝑢¬𝑓      𝑡𝑢¬𝑓 𝑡𝑢¬𝑓      𝑡𝑢¬𝑓
  We finally associate a weight 𝑘𝑢𝑓 = 𝐼𝐺(𝒟𝑢 ,𝑓 ) to each pair of user 𝑢 and feature 𝑓 to represent
the influence of a feature —in the view of the user— in the prediction of user-item interactions.

2.2. Sparse Interaction of Feature Embeddings for Prediction
KGFlex models the features in ℱ as collaboratively learned embeddings in a latent space. Since
KGFlex promotes the idea of having user fine-tuned versions of the same model, we have both
a global representation of the features in ℱ and a personal view, for each user 𝑢, of the fea-
tures in ℱ𝑢 ⊆ ℱ. Notably, the model is structured into two distinct parts. On the one hand,
KGFlex keeps a set 𝒢 of global trainable embeddings and biases shared among all the users, with
𝒢 = {(g𝑓 ∈ R𝐸 ,𝑏𝑓 ∈ R), ∀𝑓 ∈ ℱ}. On the other hand, each user in KGFlex also has his/her
personal representation of the features he/she interacted with, i.e., the features in ℱ𝑢 . These
embeddings are collected within the set 𝒫 𝑢 , defined as 𝒫 𝑢 = {p𝑢𝑓 ∈ R𝐸 , ∀𝑓 ∈ ℱ𝑢 }. Then, the
inner product between the personal representation p𝑓𝑢 and the global representation g𝑓 , plus a
bias value 𝑏𝑓 , estimates the affinity of user 𝑢 to feature 𝑓 . The sum of such affinities for all the
features in ℱ𝑢𝑖 = ℱ𝑢 ∩ℱ𝑖 , weighted according to the pre-computed entropy-based coefficients,
estimates the interaction 𝑥^ 𝑢𝑖 between user 𝑢 and item 𝑖:
                                            ∑︁
                                    𝑥
                                    ^ 𝑢𝑖 =      𝑘𝑢𝑓 (p𝑢𝑓 g𝑓 +𝑏𝑓 ).                                (10)
                                          𝑓 ∈ℱ𝑢𝑖

Eq. (10) encodes the strategy KGFlex exploits to handle thousands of model features. In fact,
it takes advantage of user profile to involve only a small subset of them in the estimate of the
user-item affinity.
    To learn the model parameters, KGFlex adopts Bayesian Personalized Ranking (BPR), the
most common pair-wise Learning to Rank strategy, based on a maximum posterior estimator.
Given∑︀a training set 𝒯 = {(𝑢,𝑖 ,𝑖 ) | 𝑖 ∈ ℐ𝑢 ∧ 𝑖 ∈ ℐ∖ℐ𝑢 ,∀𝑢 ∈ 𝒰}, BPR optimizes the loss+
                                 + −       +         −

𝐿 = (𝑢,𝑖+ ,𝑖− )∈𝒯 ln𝜎(𝑥       ^ 𝑢𝑖− ), with the assumption that a user 𝑢 prefers a consumed item 𝑖
                       ^ 𝑢𝑖+ −𝑥
�over a non-consumed item 𝑖− . The model parameters are learnt with an optimization algorithm
like SGD, as described in Rendle et al. [10].⎧To that aim, we derive:
                                            ⎪
                                            ⎪
                                            ⎪  𝑘𝑢𝑓 g𝑓 if 𝜃 = p𝑢𝑓 ,
                                  𝜕         ⎨𝑘 p𝑢 if 𝜃 = g ,
                                            ⎪
                                                𝑢𝑓 𝑓            𝑓
                                     𝑥
                                     ^ 𝑢𝑖 =                                              (11)
                                 𝜕𝜃         ⎪
                                            ⎪
                                            ⎪  𝑘𝑢𝑓      if 𝜃 = 𝑏𝑓 ,
                                                        else.
                                            ⎪
                                            ⎩0


3. Experimental Setup
In the following, we present a discussion about datasets, preprocessing, baselines, and evaluation
protocol that guided the experimentation of KGFlex.

3.1. Datasets and Filtering
The evaluation of the performance of KGFlex is conducted on two well-known datasets: Yahoo!
Movies and Facebook Books. The datasets have been binarized, retaining ratings of 3 or higher.
To ensure a fair comparison with the baselines, an iterative 10-core, and 5-core preprocessing
procedure is performed on the first and second datasets. The items have been described with a
set of semantic features retrieved through a KG exploration at depth 2 of the DBpedia 𝒦𝒢 using
a public DBpedia URI mapping. Some features (based on their 1-hop predicate) have not been
considered since they do not provide useful information [9]: dbo:wikiPageWikiLink, owl:sameAs,
rdf:type, gold:hypernym, rdfs:seeAlso, dbp:wordnet_type, dbo:wikiPageExternalLink, dbo:thumbnail,
prov:wasDerivedFrom, and dbp:wikiPageUsesTemplate. Therefore, we removed the features as-
sociated with less than ten items. Finally, we kept the user’s 100 most informative features from
the 1- and 2- hop exploration to reduce the computational costs.

3.2. Baselines, Evaluation Protocol and Metrics
To assess the effectiveness of KGFlex, we compare it with BPR-MF [10], a latent factor model based
on the same pair-wise optimization criterion used in KGFlex, a batch version of Rendle et al. [11]
MF, NeuMF [12], and kaHFM [5], a factorization-based model making use of knowledge graphs.
We have chosen the all unrated items protocol using the hold-out 80-20 splitting strategy, which
considers as candidate items all the items not rated by the user. All the models have been tested in
10 different configurations of hyperparameters with the Bayesian hyperparameter optimization
search. For the sake of reproducibility, we provide our code and a working configuration file
for the Elliot framework [13, 14]. Our goal is to assess the accuracy and beyond-accuracy
performance of KGFlex, along with its fairness properties with respect to the popularity of
the items. In detail, we have measured the recommendation accuracy with nDCG [11], also
used as the validation metric. Then, we have evaluated the diversity, adopting Item Coverage
(IC) [15] and Gini Index (Gini) [16]. Finally, three bias metrics have been used to evaluate how
the algorithms consider the items from the long-tail: ACLT [17], PopREO and PopRSP, specific
applications of RSP, and REO [18]. PopREO estimates the equal opportunity of items, encouraging
the True Positive Rate of popular and unpopular items to be the same. PopRSP measures statistical
parity, assessing whether the ranking probability distributions for popular and unpopular items
are the same in the recommendation.
�Table 1
Comparison of KGFlex with baselines (in boldface) and other reference algorithms. Among the baselines,
the best result is in boldface, the second-best result is underlined. For all the metrics, the cutoff is 10.
                                            a) Yahoo! Movies
                                 nDCG        IC      Gini    ACLT PopREO PopRSP
                Random        0.00960 1050 0.84956 5.52026              0.09847 0.00980
                Most Popular 0.15850    49 0.01263 0.00000              1.00000 1.00000
                VSM           0.04777 370 0.05245 3.11768               0.49588 0.45751
                Item-kNN      0.30739 745 0.15826 0.98842               0.70585 0.83466
                MultiVAE      0.23696 399 0.09136 0.23473               0.85433 0.96127
                BPR-MF        0.18571 151 0.02191 0.00064               0.99543 0.99989
                MF            0.28971 455 0.09024 0.08232               0.87345 0.98645
                NeuMF         0.09184   50 0.01134 0.00064              1.00000 0.99989
                kaHFM        0.30055 757 0.16591 0.46238                0.76103 0.92339
                KGFlex        0.24640 851 0.28015 2.14469              0.44768 0.63355
                                          b) Facebook Books
                                  nDCG      IC      Gini     ACLT PopREO PopRSP
                Random        0.00690 782 0.86167 5.26045              0.09794 0.00749
                Most Popular 0.09393 16 0.01265 0.00000                1.00000 1.00000
                VSM           0.03617 523 0.18874 3.80558              0.22616 0.29958
                Item-kNN      0.12903 769 0.37520 2.22524              0.48852 0.59861
                MultiVAE      0.11914 620 0.18279 0.46368              0.77695 0.91818
                BPR-MF        0.09473 17 0.01318 0.00000               1.00000 1.00000
                MF            0.09557 87 0.02376 0.00000               1.00000 1.00000
                NeuMF         0.07142 17 0.01245 0.00000               1.00000 1.00000
                kaHFM        0.12667 540 0.13866 0.32942               0.87663 0.94197
                KGFlex        0.08526 606 0.30703 3.02641             0.15210 0.44852

3.3. Results
3.3.1. Evaluating the Overall Performance
Table 1a depicts the evaluation outcome for the aforementioned metrics with a cutoff of 10 on the
Yahoo! Movies dataset. KGFlex shows satisfactory accuracy results, being outperformed only by
kaHFM and MF. It is noteworthy that KGFlex significantly outperforms BPR-MF, although both
are learned with a pair-wise BPR optimization, thus underlining the beneficial role of the extracted
knowledge. Moreover, examining the item coverage and Gini values, we note the high degree of
personalization provided by KGFlex. We link this result to the personalized view of the knowledge
granted by the framework. Moreover, in KGFlex the collaborative signal on explicit user interests
ensures to recommend diverse items among the ones sharing characteristics of interest for the
user. The aforementioned behavior is not confirmed in Facebook Books (see Table 1b). Indeed, the
accuracy results seem to remain below the performance of other factorization-based approaches.
However, the diversity results show how BPR-MF, MF, and NeuMF may have been flooded by
popularity signal, which led them to perform poorly regarding the item coverage and Gini metrics.
Instead, KGFlex does not suffer from this problem and approaches the superior performance of
Item-kNN in terms of diversity.
�                       a) Yahoo! Movies                              b) Facebook Books
                                                          0.4
            0.6                                           0.3
    HR@10




                                                  HR@10
            0.4                                           0.2
            0.2                                           0.1
              0                                             0
                  4      6           8       10                  4        6         8         10
                             SE@10                                        SE@10
              Random     Most Popular     VSM             Item-kNN    MultiVAE
              BPR-MF     MF               NeuMF           kaHFM       KGFlex
Figure 1: Accuracy vs. distributional diversity. The plots show the value of HR@10 against SE@10: the
closer to the top-right corner the better.

3.3.2. Investigating the Accuracy/Personalization Trade-Off
What we have analytically observed is substantially confirmed in Figure 1. These graphs show
the joint behavior of KGFlex on accuracy and distributional diversity by analyzing the value
of Hit Ratio (HR) with respect to the Shannon Entropy (SE) statistics, which measures how di-
versely distributed are the recommended items. Among factorization-based approaches, KGFlex
approaches the right-top margin to a greater extent, remarking its capability of providing highly
personalized recommendations, probably due to the joint operation of the global and the personal
views of the same features. The kaHFM model usually is the second-best model, while the other
approaches seem to perform very poorly in at least one dimension or do not have a stable position
when varying the dataset.

3.3.3. Analyzing the Algorithmic and Popularity Bias
Oftentimes, unpopular items are not recommended and remain underrepresented [19], thus
causing a fairness issue for items and an inappropriate recommendation for users who do not prefer
very popular items. While there is a wide range of approaches in literature aiming to reduce the
recommendation biases [20, 21, 22, 23, 24, 25, 26] , we study whether KGFlex is inherently resilient
to algorithmic bias. From Tables 1a and 1b, it is noteworthy that KGFlex always outperforms all the
other factorization-based approaches and generally outperforms the other approaches. The value
of ACLT (the higher the better) is comparable with the value obtained by VSM. This result is further
supported by the values of PopREO and PopRSP (the smaller the better). Concerning those metrics,
KGFlex and VSM continue to grant the less biased recommendations. Interestingly, while both
exploit the same optimization criterion, we notice how KGFlex consistently improves BPR-MF,
which is known to be vulnerable to imbalanced data and to produce biased recommendations [18].

4. Conclusion
We introduced KGFlex, a promising knowledge-aware approach to generate recommendations
from implicit feedback. KGFlex has demonstrated its ability to take the best from content-based
and factorization-based recommendation approaches. Moreover, thanks to the high degree of
expressiveness provided by the personalized representation of content information, KGFlex
guarantees satisfactory and diverse recommendations and resilience to algorithmic bias.
�References
 [1] Y. Koren, R. M. Bell, C. Volinsky, Matrix factorization techniques for recommender systems,
     IEEE Computer 42 (2009) 30–37.
 [2] S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: A survey and new
     perspectives, ACM Comput. Surv. 52 (2019) 5:1–5:38. URL: https://doi.org/10.1145/3285029.
     doi:10.1145/3285029.
 [3] Y. Zhu, Z. Guan, S. Tan, H. Liu, D. Cai, X. He, Heterogeneous hypergraph embedding for
     document recommendation, Neurocomputing 216 (2016) 150–162.
 [4] W. X. Zhao, S. Li, Y. He, L. Wang, J. Wen, X. Li, Exploring demographic information in social
     media for product recommendation, Knowl. Inf. Syst. 49 (2016) 61–89.
 [5] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, How to make latent factors
     interpretable by feeding factorization machines with knowledge graphs, in: ISWC (1),
     volume 11778 of Lecture Notes in Computer Science, Springer, 2019, pp. 38–56.
 [6] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ferrara, A. C. M. Mancino, Sparse feature factoriza-
     tion for recommender systems with knowledge graphs, in: RecSys, ACM, 2021, pp. 154–165.
 [7] T. Berners-Lee, J. Hendler, O. Lassila,             The semantic web,         Scientific Amer-
     ican 284 (2001) 34–43. URL: http://www.sciam.com/article.cfm?articleID=
     00048144-10D2-1C70-84A9809EC588EF21.
 [8] T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Synthesis
     Lectures on the Semantic Web, Morgan & Claypool Publishers, 2011. URL: https://doi.org/
     10.2200/S00334ED1V01Y201102WBE001. doi:10.2200/S00334ED1V01Y201102WBE001.
 [9] T. D. Noia, C. Magarelli, A. Maurino, M. Palmonari, A. Rula, Using ontology-based data
     summarization to develop semantics-aware recommender systems, in: ESWC, volume
     10843 of Lecture Notes in Computer Science, Springer, 2018, pp. 128–144.
[10] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: bayesian personalized
     ranking from implicit feedback, in: UAI, AUAI Press, 2009, pp. 452–461.
[11] S. Rendle, W. Krichene, L. Zhang, J. R. Anderson, Neural collaborative filtering vs. matrix
     factorization revisited, in: RecSys, ACM, 2020, pp. 240–248.
[12] X. He, T. Chua, Neural factorization machines for sparse predictive analytics, in: SIGIR,
     ACM, 2017, pp. 355–364.
[13] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D.
     Noia, Elliot: A comprehensive and rigorous framework for reproducible recommender
     systems evaluation, in: SIGIR, ACM, 2021, pp. 2405–2414.
[14] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D.
     Noia, V-elliot: Design, evaluate and tune visual recommender systems, in: RecSys, ACM,
     2021, pp. 768–771.
[15] G. Adomavicius, Y. Kwon, Improving aggregate recommendation diversity using
     ranking-based techniques, IEEE Trans. Knowl. Data Eng. 24 (2012) 896–911.
[16] P. Castells, N. J. Hurley, S. Vargas, Novelty and diversity in recommender systems, in:
     F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook, Springer, 2015,
     pp. 881–918. URL: https://doi.org/10.1007/978-1-4899-7637-6_26.
[17] H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender systems
     with personalized re-ranking, in: FLAIRS Conference, AAAI Press, 2019, pp. 413–418.
�[18] Z. Zhu, J. Wang, J. Caverlee, Measuring and mitigating item under-recommendation bias
     in personalized ranking systems, in: SIGIR, ACM, 2020, pp. 449–458.
[19] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity
     bias in recommendation, in: RMSE@RecSys, volume 2440 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2019.
[20] C. Zhou, J. Ma, J. Zhang, J. Zhou, H. Yang, Contrastive learning for debiased candidate
     generation in large-scale recommender systems, in: KDD, ACM, 2021, pp. 3985–3995.
[21] R. Borges, K. Stefanidis, On mitigating popularity bias in recommendations via variational
     autoencoders, in: SAC, ACM, 2021, pp. 1383–1389.
[22] T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, T. Joachims, Recommendations as
     treatments: Debiasing learning and evaluation, in: ICML, volume 48 of JMLR Workshop
     and Conference Proceedings, JMLR.org, 2016, pp. 1670–1679.
[23] D. Jannach, L. Lerche, I. Kamehkhosh, M. Jugovac, What recommenders recommend: an
     analysis of recommendation biases and possible countermeasures, User Model. User Adapt.
     Interact. 25 (2015) 427–491.
[24] M. D. Ekstrand, M. Tian, I. M. Azpiazu, J. D. Ekstrand, O. Anuyah, D. McNeill, M. S. Pera,
     All the cool kids, how do they fit in?: Popularity and demographic biases in recommender
     evaluation and effectiveness, in: FAT, volume 81 of Proceedings of Machine Learning
     Research, PMLR, 2018, pp. 172–186.
[25] L. Boratto, G. Fenu, M. Marras, Connecting user and item perspectives in popularity
     debiasing for collaborative recommendation, Inf. Process. Manag. 58 (2021) 102387.
[26] Y. Zhang, F. Feng, X. He, T. Wei, C. Song, G. Ling, Y. Zhang, Causal intervention for
     leveraging popularity bias in recommendation, in: SIGIR, ACM, 2021, pp. 11–20.
�
🖨 🚪