Paper | |
---|---|
edit | |
description | |
id | Vol-3194/paper59 |
wikidataid | Q117344892→Q117344892 |
title | Inferring User Decision-Making Processes in Recommender Systems with Knowledge Graphs |
pdfUrl | https://ceur-ws.org/Vol-3194/paper59.pdf |
dblpUrl | https://dblp.org/rec/conf/sebd/AnelliNSFM22 |
volume | Vol-3194→Vol-3194 |
session | → |
Paper | |
---|---|
edit | |
description | |
id | Vol-3194/paper59 |
wikidataid | Q117344892→Q117344892 |
title | Inferring User Decision-Making Processes in Recommender Systems with Knowledge Graphs |
pdfUrl | https://ceur-ws.org/Vol-3194/paper59.pdf |
dblpUrl | https://dblp.org/rec/conf/sebd/AnelliNSFM22 |
volume | Vol-3194→Vol-3194 |
session | → |
Inferring User Decision-Making Processes in Recommender Systems with Knowledge Graphs Discussion Paper Vito Walter Anelli1 , Tommaso Di Noia1 , Eugenio Di Sciascio1 , Antonio Ferrara1 and Alberto Carlo Maria Mancino1 1 Polytechnic University of Bari, via Orabona, 4, 70125 Bari, Italy Abstract This paper proposes a sparse factorization approach, KGFlex, that represents each item feature as an em- bedding. With KGFlex, the user-item interactions are a factorized combination of the item features relevant to the user. An entropy-driven module drives the training considering only the feature involved in the user’s decision-making process. Extensive experiments confirm the approach’s effectiveness, considering the ranking accuracy, diversity, and induced bias. The public implementation of KGFlex is available at https://split.to/kgflex. Keywords Recommender systems, Information Retrieval, feature factorization, entropy, knowledge graphs 1. Introduction At least once, everyone asked a friend to suggest a movie to watch. This natural and yet effective behavior of obtaining recommendations from similar people is the foundation of Collaborative Filtering (CF) recommendation techniques. In fact, it is their remarkable accuracy that helped Recommender Systems getting famous. Popular examples of CF are Matrix Factorization [1], Nearest Neighbors, and the recent Deep Learning models [2]. Despite the notable performance, these techniques have some severe limitations: they need a considerable number of transactions to work, and they fail to suggest fresh items. On the other side of the coin, there are Content- Based Filtering (CBF) methods. At the cost of lower accuracy, not only are they interpretable, but they also succeed in suggesting fresh items. However, these algorithms are affected by the overspecialization problem, and they struggle to suggest items different from the user history. To benefit from both approaches and alleviate the drawbacks, researchers integrated into Col- laborative Filtering the side information used in content-based approaches such as tags [3], SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy $ vitowalter.anelli@poliba.it (V. W. Anelli); tommaso.dinoia@poliba.it (T. Di Noia); eugenio.disciascio@poliba.it (E. Di Sciascio); antonio.ferrara@poliba.it (A. Ferrara); alberto.mancino@poliba.it (A. C. M. Mancino) https://sisinflab.poliba.it/people/vito-walter-anelli/ (V. W. Anelli); https://sisinflab.poliba.it/people/tommaso-di-noia/ (T. Di Noia); https://sisinflab.poliba.it/people/eugenio-di-sciascio/ (E. Di Sciascio); https://sisinflab.poliba.it/people/antonio-ferrara/ (A. Ferrara); https://sisinflab.poliba.it/people/alberto-carlo-maria-mancino/ (A. C. M. Mancino) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) �demographic data [4], structured knowledge [5]. However, rarely do these models express their full potential since they combine large and dense models with hundreds or thousands of features, and they are computationally expensive. Among the various information sources, Knowledge Graphs (𝒦𝒢𝑠) are gaining momentum. This work discusses KGFlex [6], a sparse embedding model that extracts facts and knowledge from publicly available knowledge graphs to describe the catalog items. The underlying factorization model shapes the user interactions with item features through low-dimensionality embeddings. KGFlex examines the user-specific decision-making process of consuming or not consuming an item to weight feature embeddings employing an entropy-based strategy. Consequently, the user profile only comprises an individual representation of each relevant feature. To evaluate the efficacy of KGFlex, we conduct extensive experiments on two different publicly available datasets extracting content information from DBpedia12 . The results show that KGFlex has competitive accuracy performance, and at the same time, generates highly diversified recommendations with a low induced bias. 2. Knowledge Graph Sparse Embeddings KGFlex exploits the knowledge encoded in a knowledge graph as side information to characterize both items and users. One of the main assumptions is that users decide to enjoy an item based on a subset of its characteristics, implying that not all the item features are equally important. In the following, we show how KGFlex describes each user and item with a set of features. Taking a cue from information theory KGFlex exploits the notion of information gain to measure the relevance of a feature for a user in deciding to consume or not an item. 2.1. From Knowledge Graphs to Decision-Making In KGFlex, items and users are characterized by sets of features extracted from a 𝒦𝒢, a knowledge base of semantically linked machine-understandable data [7]. During the years, 𝒦𝒢𝑠 have gained more and more success thanks to the Linked Data initiative [8]. Today we can benefit from 1,527 different 𝒦𝒢𝑠 connected in the so-called Linked Open Data Cloud3 . A knowledge graph 𝒦𝒢 can be represented as pairs of entities linked to each other by binary 𝜌 relations. Each connection in 𝒦𝒢 is denoted by 𝜎 − → 𝜔, where 𝜎 is a subject entity, 𝜌 is a directed relation (predicate), and 𝜔 is an object entity. Hereinafter, we generalize the previous notion to multi-hop predicates (i.e., considering chains of predicates that connect two entities at a higher 𝜌1 𝜌2 𝜌𝑛 depth). Let 𝑛-hop predicate be defined as 𝜌 = ⟨𝜌1 ,...,𝜌𝑛 ⟩ if 𝜎 −→ 𝜔1 −→ ... −→ 𝜔𝑛 ∈ 𝒦𝒢. For 𝜌 convenience, ℎ(𝜌) = 𝑛 for 𝜌 : 𝜎 −→ 𝜔𝑛 ∈ 𝒦𝒢 denotes the depth of the predicate chain, and when 𝜌 no confusion arises we will use 𝜎 −→ 𝜔 to denote a generic chain with ℎ(𝜌) ≥ 1. 2.1.1. Extraction of Item and User Features from Knowledge Graph Given a collection of items ℐ and a knowledge graph 𝒦𝒢, we assume each element in 𝑖 ∈ ℐ has a mapping to a corresponding entity in 𝒦𝒢. Under this assumption, an item 𝑖 can be explored, 1 http://dbpedia.org 2 https://github.com/sisinflab/LinkedDatasets 3 https://lod-cloud.net/datasets � (𝑛) at depth 𝑛, to identify the set ℱ𝑖 of the semantic features describing it: (𝑛) 𝜌 ℱ𝑖 = {⟨𝜌, 𝜔⟩ | 𝑖 − → 𝜔 ∈ 𝒦𝒢 ,ℎ(𝜌) ∈ {1,...,𝑛}}. (1) At this stage, feature filtering, graph pruning, and semantic feature selection techniques [9] can be applied to control the computational and memory load and improve system performance. We describe each user 𝑢 ∈ 𝒰 with the set ℱ𝑢 of the features representing the items ℐ𝑢 ⊆ ℐ enjoyed by 𝑢: ⋃︁ (𝑛) ℱ𝑢(𝑛) = ℱ𝑖 . (2) 𝑖∈ℐ𝑢 Finally, we define the overall set ℱ (𝑛) of features in the system: ⋃︁ (𝑛) ℱ (𝑛) = ℱ𝑖 . (3) 𝑖∈ℐ In the following, for convenience, the (𝑛) superscript is omitted whenever it is not relevant in the context. 2.1.2. Information Gain of User Features In information theory, entropy is used to measure the uncertainty of a random variable. In particular, the entropy 𝐻(𝑉 ) of a random variable 𝑉 with 𝑘 possible values in {𝑣1 ,...,𝑣𝑘 } is defined as: ∑︁𝑘 𝐻(𝑉 ) = − 𝑃 (𝑉 = 𝑣𝑖 )log2 𝑃 (𝑉 = 𝑣𝑖 ). (4) 𝑖=1 It is straightforward to check that a coin that always comes up heads has zero entropy, while a fair coin equally likely to come up heads or tails when flipped has entropy 1. Notably, if 𝐵𝑞 is a binary random variable that is true with probability 𝑞, we have 𝐻(𝐵𝑞 ) = −𝑞log2 𝑞−(1−𝑞)log2 (1−𝑞). For instance, given a dataset 𝒟 of training samples in the form (x,𝑦), with x ∈ R𝐹 and 𝑦 ∈ {0,1} is a binary variable that is true with probability 𝑞, its entropy is 𝐻(𝒟) = 𝐻(𝑦𝑞 ). As a consequence, a dataset with a balanced number of positive and negative samples has entropy 1. In this context, the information gain 𝐼𝐺(𝒟,𝑥𝑑 ) measures the expected reduction in information entropy obtained from the observation of one of the attributes 𝑥𝑑 in x: 𝐼𝐺(𝒟,𝑥𝑑 ) = 𝐻(𝒟)−𝐻(𝒟|𝑥𝑑 ), (5) where 𝐻(𝒟|𝑥𝑑 ) is the expected entropy of 𝒟 conditioned on 𝑥𝑑 . If 𝑥𝑑 can assume 𝑘 distinct values 𝑥𝑑,1 ,...𝑥𝑑,𝑘 with a categorical probability distribution,the dataset 𝒟 is partitioned into 𝑘 mutually exclusive subsets and the following conditioned entropy is determined: 𝑘 ∑︁ 𝐻(𝒟|𝑥𝑑 ) = 𝑃 (𝑥𝑑 = 𝑥𝑑,𝑖 )𝐻(𝒟|𝑥𝑑 = 𝑥𝑑,𝑖 ). (6) 𝑖=1 Since the information gain defined in Eq. (5) returns a measure of the importance of a single attribute in distinguishing positive from negative examples in a dataset, we build, for each user 𝑢, �a balanced dataset 𝒟𝑢 with ⋃︀ all the consumed items from ℐ𝑢 and the same amount of negative items randomly picked up from 𝑣∈𝒰 ,𝑣̸=𝑢 ℐ𝑣 ∖ℐ𝑢 . Each sample in 𝒟𝑢 is provided with a set of binary variables corresponding to the features in ℱ𝑢 . Each variable indicates, for each item 𝑖 in 𝒟𝑢 , the presence (𝑓 = 1) or the absence (𝑓 = 0) of the corresponding feature in the set ℱ𝑖 . According to the definition, 𝐻(𝒟𝑢 ) = 1. Therefore, the information gain for each feature 𝑓 ∈ ℱ𝑢 can be computed using the dataset 𝒟𝑢 . Let 𝑝𝑢𝑓 be the number of positive samples in 𝒟𝑢 for which 𝑓 = 1, 𝑛𝑢𝑓 the number of negative samples for which the same feature is present, and 𝑡𝑢𝑓 = 𝑝𝑢𝑓 +𝑛𝑢𝑓 . Analogously, 𝑝𝑢¬𝑓 = |ℐ𝑢 |−𝑝𝑢𝑓 is the number of positive samples with 𝑓 = 0, 𝑛𝑢¬𝑓 = |ℐ𝑢 |−𝑛𝑢𝑓 is the number of negative samples with 𝑓 = 0, and 𝑡𝑢¬𝑓 = 𝑝𝑢¬𝑓 +𝑛𝑢¬𝑓 . Following Eqq. (5) and (6): 𝐼𝐺(𝒟𝑢 ,𝑓 ) = 1−𝐻(𝒟𝑢 |𝑓 = 1)−𝐻(𝒟𝑢 |𝑓 = 0), (7) (︂ )︂ 𝑡𝑢𝑓 𝑝𝑢𝑓 𝑝𝑢𝑓 𝑛𝑢𝑓 𝑛𝑢𝑓 𝐻(𝒟𝑢 |𝑓 = 1) = − log2 − log2 , (8) |𝒟𝑢 | 𝑡𝑢𝑓 𝑡𝑢𝑓 𝑡𝑢𝑓 𝑡𝑢𝑓 (︂ )︂ 𝑡𝑢¬𝑓 𝑝𝑢¬𝑓 𝑝𝑢¬𝑓 𝑛𝑢¬𝑓 𝑛𝑢¬𝑓 𝐻(𝒟𝑢 |𝑓 = 0) = − log2 − log2 . (9) |𝒟𝑢 | 𝑡𝑢¬𝑓 𝑡𝑢¬𝑓 𝑡𝑢¬𝑓 𝑡𝑢¬𝑓 We finally associate a weight 𝑘𝑢𝑓 = 𝐼𝐺(𝒟𝑢 ,𝑓 ) to each pair of user 𝑢 and feature 𝑓 to represent the influence of a feature —in the view of the user— in the prediction of user-item interactions. 2.2. Sparse Interaction of Feature Embeddings for Prediction KGFlex models the features in ℱ as collaboratively learned embeddings in a latent space. Since KGFlex promotes the idea of having user fine-tuned versions of the same model, we have both a global representation of the features in ℱ and a personal view, for each user 𝑢, of the fea- tures in ℱ𝑢 ⊆ ℱ. Notably, the model is structured into two distinct parts. On the one hand, KGFlex keeps a set 𝒢 of global trainable embeddings and biases shared among all the users, with 𝒢 = {(g𝑓 ∈ R𝐸 ,𝑏𝑓 ∈ R), ∀𝑓 ∈ ℱ}. On the other hand, each user in KGFlex also has his/her personal representation of the features he/she interacted with, i.e., the features in ℱ𝑢 . These embeddings are collected within the set 𝒫 𝑢 , defined as 𝒫 𝑢 = {p𝑢𝑓 ∈ R𝐸 , ∀𝑓 ∈ ℱ𝑢 }. Then, the inner product between the personal representation p𝑓𝑢 and the global representation g𝑓 , plus a bias value 𝑏𝑓 , estimates the affinity of user 𝑢 to feature 𝑓 . The sum of such affinities for all the features in ℱ𝑢𝑖 = ℱ𝑢 ∩ℱ𝑖 , weighted according to the pre-computed entropy-based coefficients, estimates the interaction 𝑥^ 𝑢𝑖 between user 𝑢 and item 𝑖: ∑︁ 𝑥 ^ 𝑢𝑖 = 𝑘𝑢𝑓 (p𝑢𝑓 g𝑓 +𝑏𝑓 ). (10) 𝑓 ∈ℱ𝑢𝑖 Eq. (10) encodes the strategy KGFlex exploits to handle thousands of model features. In fact, it takes advantage of user profile to involve only a small subset of them in the estimate of the user-item affinity. To learn the model parameters, KGFlex adopts Bayesian Personalized Ranking (BPR), the most common pair-wise Learning to Rank strategy, based on a maximum posterior estimator. Given∑︀a training set 𝒯 = {(𝑢,𝑖 ,𝑖 ) | 𝑖 ∈ ℐ𝑢 ∧ 𝑖 ∈ ℐ∖ℐ𝑢 ,∀𝑢 ∈ 𝒰}, BPR optimizes the loss+ + − + − 𝐿 = (𝑢,𝑖+ ,𝑖− )∈𝒯 ln𝜎(𝑥 ^ 𝑢𝑖− ), with the assumption that a user 𝑢 prefers a consumed item 𝑖 ^ 𝑢𝑖+ −𝑥 �over a non-consumed item 𝑖− . The model parameters are learnt with an optimization algorithm like SGD, as described in Rendle et al. [10].⎧To that aim, we derive: ⎪ ⎪ ⎪ 𝑘𝑢𝑓 g𝑓 if 𝜃 = p𝑢𝑓 , 𝜕 ⎨𝑘 p𝑢 if 𝜃 = g , ⎪ 𝑢𝑓 𝑓 𝑓 𝑥 ^ 𝑢𝑖 = (11) 𝜕𝜃 ⎪ ⎪ ⎪ 𝑘𝑢𝑓 if 𝜃 = 𝑏𝑓 , else. ⎪ ⎩0 3. Experimental Setup In the following, we present a discussion about datasets, preprocessing, baselines, and evaluation protocol that guided the experimentation of KGFlex. 3.1. Datasets and Filtering The evaluation of the performance of KGFlex is conducted on two well-known datasets: Yahoo! Movies and Facebook Books. The datasets have been binarized, retaining ratings of 3 or higher. To ensure a fair comparison with the baselines, an iterative 10-core, and 5-core preprocessing procedure is performed on the first and second datasets. The items have been described with a set of semantic features retrieved through a KG exploration at depth 2 of the DBpedia 𝒦𝒢 using a public DBpedia URI mapping. Some features (based on their 1-hop predicate) have not been considered since they do not provide useful information [9]: dbo:wikiPageWikiLink, owl:sameAs, rdf:type, gold:hypernym, rdfs:seeAlso, dbp:wordnet_type, dbo:wikiPageExternalLink, dbo:thumbnail, prov:wasDerivedFrom, and dbp:wikiPageUsesTemplate. Therefore, we removed the features as- sociated with less than ten items. Finally, we kept the user’s 100 most informative features from the 1- and 2- hop exploration to reduce the computational costs. 3.2. Baselines, Evaluation Protocol and Metrics To assess the effectiveness of KGFlex, we compare it with BPR-MF [10], a latent factor model based on the same pair-wise optimization criterion used in KGFlex, a batch version of Rendle et al. [11] MF, NeuMF [12], and kaHFM [5], a factorization-based model making use of knowledge graphs. We have chosen the all unrated items protocol using the hold-out 80-20 splitting strategy, which considers as candidate items all the items not rated by the user. All the models have been tested in 10 different configurations of hyperparameters with the Bayesian hyperparameter optimization search. For the sake of reproducibility, we provide our code and a working configuration file for the Elliot framework [13, 14]. Our goal is to assess the accuracy and beyond-accuracy performance of KGFlex, along with its fairness properties with respect to the popularity of the items. In detail, we have measured the recommendation accuracy with nDCG [11], also used as the validation metric. Then, we have evaluated the diversity, adopting Item Coverage (IC) [15] and Gini Index (Gini) [16]. Finally, three bias metrics have been used to evaluate how the algorithms consider the items from the long-tail: ACLT [17], PopREO and PopRSP, specific applications of RSP, and REO [18]. PopREO estimates the equal opportunity of items, encouraging the True Positive Rate of popular and unpopular items to be the same. PopRSP measures statistical parity, assessing whether the ranking probability distributions for popular and unpopular items are the same in the recommendation. �Table 1 Comparison of KGFlex with baselines (in boldface) and other reference algorithms. Among the baselines, the best result is in boldface, the second-best result is underlined. For all the metrics, the cutoff is 10. a) Yahoo! Movies nDCG IC Gini ACLT PopREO PopRSP Random 0.00960 1050 0.84956 5.52026 0.09847 0.00980 Most Popular 0.15850 49 0.01263 0.00000 1.00000 1.00000 VSM 0.04777 370 0.05245 3.11768 0.49588 0.45751 Item-kNN 0.30739 745 0.15826 0.98842 0.70585 0.83466 MultiVAE 0.23696 399 0.09136 0.23473 0.85433 0.96127 BPR-MF 0.18571 151 0.02191 0.00064 0.99543 0.99989 MF 0.28971 455 0.09024 0.08232 0.87345 0.98645 NeuMF 0.09184 50 0.01134 0.00064 1.00000 0.99989 kaHFM 0.30055 757 0.16591 0.46238 0.76103 0.92339 KGFlex 0.24640 851 0.28015 2.14469 0.44768 0.63355 b) Facebook Books nDCG IC Gini ACLT PopREO PopRSP Random 0.00690 782 0.86167 5.26045 0.09794 0.00749 Most Popular 0.09393 16 0.01265 0.00000 1.00000 1.00000 VSM 0.03617 523 0.18874 3.80558 0.22616 0.29958 Item-kNN 0.12903 769 0.37520 2.22524 0.48852 0.59861 MultiVAE 0.11914 620 0.18279 0.46368 0.77695 0.91818 BPR-MF 0.09473 17 0.01318 0.00000 1.00000 1.00000 MF 0.09557 87 0.02376 0.00000 1.00000 1.00000 NeuMF 0.07142 17 0.01245 0.00000 1.00000 1.00000 kaHFM 0.12667 540 0.13866 0.32942 0.87663 0.94197 KGFlex 0.08526 606 0.30703 3.02641 0.15210 0.44852 3.3. Results 3.3.1. Evaluating the Overall Performance Table 1a depicts the evaluation outcome for the aforementioned metrics with a cutoff of 10 on the Yahoo! Movies dataset. KGFlex shows satisfactory accuracy results, being outperformed only by kaHFM and MF. It is noteworthy that KGFlex significantly outperforms BPR-MF, although both are learned with a pair-wise BPR optimization, thus underlining the beneficial role of the extracted knowledge. Moreover, examining the item coverage and Gini values, we note the high degree of personalization provided by KGFlex. We link this result to the personalized view of the knowledge granted by the framework. Moreover, in KGFlex the collaborative signal on explicit user interests ensures to recommend diverse items among the ones sharing characteristics of interest for the user. The aforementioned behavior is not confirmed in Facebook Books (see Table 1b). Indeed, the accuracy results seem to remain below the performance of other factorization-based approaches. However, the diversity results show how BPR-MF, MF, and NeuMF may have been flooded by popularity signal, which led them to perform poorly regarding the item coverage and Gini metrics. Instead, KGFlex does not suffer from this problem and approaches the superior performance of Item-kNN in terms of diversity. � a) Yahoo! Movies b) Facebook Books 0.4 0.6 0.3 HR@10 HR@10 0.4 0.2 0.2 0.1 0 0 4 6 8 10 4 6 8 10 SE@10 SE@10 Random Most Popular VSM Item-kNN MultiVAE BPR-MF MF NeuMF kaHFM KGFlex Figure 1: Accuracy vs. distributional diversity. The plots show the value of HR@10 against SE@10: the closer to the top-right corner the better. 3.3.2. Investigating the Accuracy/Personalization Trade-Off What we have analytically observed is substantially confirmed in Figure 1. These graphs show the joint behavior of KGFlex on accuracy and distributional diversity by analyzing the value of Hit Ratio (HR) with respect to the Shannon Entropy (SE) statistics, which measures how di- versely distributed are the recommended items. Among factorization-based approaches, KGFlex approaches the right-top margin to a greater extent, remarking its capability of providing highly personalized recommendations, probably due to the joint operation of the global and the personal views of the same features. The kaHFM model usually is the second-best model, while the other approaches seem to perform very poorly in at least one dimension or do not have a stable position when varying the dataset. 3.3.3. Analyzing the Algorithmic and Popularity Bias Oftentimes, unpopular items are not recommended and remain underrepresented [19], thus causing a fairness issue for items and an inappropriate recommendation for users who do not prefer very popular items. While there is a wide range of approaches in literature aiming to reduce the recommendation biases [20, 21, 22, 23, 24, 25, 26] , we study whether KGFlex is inherently resilient to algorithmic bias. From Tables 1a and 1b, it is noteworthy that KGFlex always outperforms all the other factorization-based approaches and generally outperforms the other approaches. The value of ACLT (the higher the better) is comparable with the value obtained by VSM. This result is further supported by the values of PopREO and PopRSP (the smaller the better). Concerning those metrics, KGFlex and VSM continue to grant the less biased recommendations. Interestingly, while both exploit the same optimization criterion, we notice how KGFlex consistently improves BPR-MF, which is known to be vulnerable to imbalanced data and to produce biased recommendations [18]. 4. Conclusion We introduced KGFlex, a promising knowledge-aware approach to generate recommendations from implicit feedback. KGFlex has demonstrated its ability to take the best from content-based and factorization-based recommendation approaches. Moreover, thanks to the high degree of expressiveness provided by the personalized representation of content information, KGFlex guarantees satisfactory and diverse recommendations and resilience to algorithmic bias. �References [1] Y. Koren, R. M. Bell, C. Volinsky, Matrix factorization techniques for recommender systems, IEEE Computer 42 (2009) 30–37. [2] S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: A survey and new perspectives, ACM Comput. Surv. 52 (2019) 5:1–5:38. URL: https://doi.org/10.1145/3285029. doi:10.1145/3285029. [3] Y. Zhu, Z. Guan, S. Tan, H. Liu, D. Cai, X. He, Heterogeneous hypergraph embedding for document recommendation, Neurocomputing 216 (2016) 150–162. [4] W. X. Zhao, S. Li, Y. He, L. Wang, J. Wen, X. Li, Exploring demographic information in social media for product recommendation, Knowl. Inf. Syst. 49 (2016) 61–89. [5] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, How to make latent factors interpretable by feeding factorization machines with knowledge graphs, in: ISWC (1), volume 11778 of Lecture Notes in Computer Science, Springer, 2019, pp. 38–56. [6] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ferrara, A. C. M. Mancino, Sparse feature factoriza- tion for recommender systems with knowledge graphs, in: RecSys, ACM, 2021, pp. 154–165. [7] T. Berners-Lee, J. Hendler, O. Lassila, The semantic web, Scientific Amer- ican 284 (2001) 34–43. URL: http://www.sciam.com/article.cfm?articleID= 00048144-10D2-1C70-84A9809EC588EF21. [8] T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Synthesis Lectures on the Semantic Web, Morgan & Claypool Publishers, 2011. URL: https://doi.org/ 10.2200/S00334ED1V01Y201102WBE001. doi:10.2200/S00334ED1V01Y201102WBE001. [9] T. D. Noia, C. Magarelli, A. Maurino, M. Palmonari, A. Rula, Using ontology-based data summarization to develop semantics-aware recommender systems, in: ESWC, volume 10843 of Lecture Notes in Computer Science, Springer, 2018, pp. 128–144. [10] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: bayesian personalized ranking from implicit feedback, in: UAI, AUAI Press, 2009, pp. 452–461. [11] S. Rendle, W. Krichene, L. Zhang, J. R. Anderson, Neural collaborative filtering vs. matrix factorization revisited, in: RecSys, ACM, 2020, pp. 240–248. [12] X. He, T. Chua, Neural factorization machines for sparse predictive analytics, in: SIGIR, ACM, 2017, pp. 355–364. [13] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D. Noia, Elliot: A comprehensive and rigorous framework for reproducible recommender systems evaluation, in: SIGIR, ACM, 2021, pp. 2405–2414. [14] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D. Noia, V-elliot: Design, evaluate and tune visual recommender systems, in: RecSys, ACM, 2021, pp. 768–771. [15] G. Adomavicius, Y. Kwon, Improving aggregate recommendation diversity using ranking-based techniques, IEEE Trans. Knowl. Data Eng. 24 (2012) 896–911. [16] P. Castells, N. J. Hurley, S. Vargas, Novelty and diversity in recommender systems, in: F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook, Springer, 2015, pp. 881–918. URL: https://doi.org/10.1007/978-1-4899-7637-6_26. [17] H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender systems with personalized re-ranking, in: FLAIRS Conference, AAAI Press, 2019, pp. 413–418. �[18] Z. Zhu, J. Wang, J. Caverlee, Measuring and mitigating item under-recommendation bias in personalized ranking systems, in: SIGIR, ACM, 2020, pp. 449–458. [19] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity bias in recommendation, in: RMSE@RecSys, volume 2440 of CEUR Workshop Proceedings, CEUR-WS.org, 2019. [20] C. Zhou, J. Ma, J. Zhang, J. Zhou, H. Yang, Contrastive learning for debiased candidate generation in large-scale recommender systems, in: KDD, ACM, 2021, pp. 3985–3995. [21] R. Borges, K. Stefanidis, On mitigating popularity bias in recommendations via variational autoencoders, in: SAC, ACM, 2021, pp. 1383–1389. [22] T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, T. Joachims, Recommendations as treatments: Debiasing learning and evaluation, in: ICML, volume 48 of JMLR Workshop and Conference Proceedings, JMLR.org, 2016, pp. 1670–1679. [23] D. Jannach, L. Lerche, I. Kamehkhosh, M. Jugovac, What recommenders recommend: an analysis of recommendation biases and possible countermeasures, User Model. User Adapt. Interact. 25 (2015) 427–491. [24] M. D. Ekstrand, M. Tian, I. M. Azpiazu, J. D. Ekstrand, O. Anuyah, D. McNeill, M. S. Pera, All the cool kids, how do they fit in?: Popularity and demographic biases in recommender evaluation and effectiveness, in: FAT, volume 81 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 172–186. [25] L. Boratto, G. Fenu, M. Marras, Connecting user and item perspectives in popularity debiasing for collaborative recommendation, Inf. Process. Manag. 58 (2021) 102387. [26] Y. Zhang, F. Feng, X. He, T. Wei, C. Song, G. Ling, Y. Zhang, Causal intervention for leveraging popularity bias in recommendation, in: SIGIR, ACM, 2021, pp. 11–20. �
Inferring User Decision-Making Processes in Recommender Systems with Knowledge Graphs Discussion Paper Vito Walter Anelli1 , Tommaso Di Noia1 , Eugenio Di Sciascio1 , Antonio Ferrara1 and Alberto Carlo Maria Mancino1 1 Polytechnic University of Bari, via Orabona, 4, 70125 Bari, Italy Abstract This paper proposes a sparse factorization approach, KGFlex, that represents each item feature as an em- bedding. With KGFlex, the user-item interactions are a factorized combination of the item features relevant to the user. An entropy-driven module drives the training considering only the feature involved in the user’s decision-making process. Extensive experiments confirm the approach’s effectiveness, considering the ranking accuracy, diversity, and induced bias. The public implementation of KGFlex is available at https://split.to/kgflex. Keywords Recommender systems, Information Retrieval, feature factorization, entropy, knowledge graphs 1. Introduction At least once, everyone asked a friend to suggest a movie to watch. This natural and yet effective behavior of obtaining recommendations from similar people is the foundation of Collaborative Filtering (CF) recommendation techniques. In fact, it is their remarkable accuracy that helped Recommender Systems getting famous. Popular examples of CF are Matrix Factorization [1], Nearest Neighbors, and the recent Deep Learning models [2]. Despite the notable performance, these techniques have some severe limitations: they need a considerable number of transactions to work, and they fail to suggest fresh items. On the other side of the coin, there are Content- Based Filtering (CBF) methods. At the cost of lower accuracy, not only are they interpretable, but they also succeed in suggesting fresh items. However, these algorithms are affected by the overspecialization problem, and they struggle to suggest items different from the user history. To benefit from both approaches and alleviate the drawbacks, researchers integrated into Col- laborative Filtering the side information used in content-based approaches such as tags [3], SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy $ vitowalter.anelli@poliba.it (V. W. Anelli); tommaso.dinoia@poliba.it (T. Di Noia); eugenio.disciascio@poliba.it (E. Di Sciascio); antonio.ferrara@poliba.it (A. Ferrara); alberto.mancino@poliba.it (A. C. M. Mancino) https://sisinflab.poliba.it/people/vito-walter-anelli/ (V. W. Anelli); https://sisinflab.poliba.it/people/tommaso-di-noia/ (T. Di Noia); https://sisinflab.poliba.it/people/eugenio-di-sciascio/ (E. Di Sciascio); https://sisinflab.poliba.it/people/antonio-ferrara/ (A. Ferrara); https://sisinflab.poliba.it/people/alberto-carlo-maria-mancino/ (A. C. M. Mancino) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) �demographic data [4], structured knowledge [5]. However, rarely do these models express their full potential since they combine large and dense models with hundreds or thousands of features, and they are computationally expensive. Among the various information sources, Knowledge Graphs (𝒦𝒢𝑠) are gaining momentum. This work discusses KGFlex [6], a sparse embedding model that extracts facts and knowledge from publicly available knowledge graphs to describe the catalog items. The underlying factorization model shapes the user interactions with item features through low-dimensionality embeddings. KGFlex examines the user-specific decision-making process of consuming or not consuming an item to weight feature embeddings employing an entropy-based strategy. Consequently, the user profile only comprises an individual representation of each relevant feature. To evaluate the efficacy of KGFlex, we conduct extensive experiments on two different publicly available datasets extracting content information from DBpedia12 . The results show that KGFlex has competitive accuracy performance, and at the same time, generates highly diversified recommendations with a low induced bias. 2. Knowledge Graph Sparse Embeddings KGFlex exploits the knowledge encoded in a knowledge graph as side information to characterize both items and users. One of the main assumptions is that users decide to enjoy an item based on a subset of its characteristics, implying that not all the item features are equally important. In the following, we show how KGFlex describes each user and item with a set of features. Taking a cue from information theory KGFlex exploits the notion of information gain to measure the relevance of a feature for a user in deciding to consume or not an item. 2.1. From Knowledge Graphs to Decision-Making In KGFlex, items and users are characterized by sets of features extracted from a 𝒦𝒢, a knowledge base of semantically linked machine-understandable data [7]. During the years, 𝒦𝒢𝑠 have gained more and more success thanks to the Linked Data initiative [8]. Today we can benefit from 1,527 different 𝒦𝒢𝑠 connected in the so-called Linked Open Data Cloud3 . A knowledge graph 𝒦𝒢 can be represented as pairs of entities linked to each other by binary 𝜌 relations. Each connection in 𝒦𝒢 is denoted by 𝜎 − → 𝜔, where 𝜎 is a subject entity, 𝜌 is a directed relation (predicate), and 𝜔 is an object entity. Hereinafter, we generalize the previous notion to multi-hop predicates (i.e., considering chains of predicates that connect two entities at a higher 𝜌1 𝜌2 𝜌𝑛 depth). Let 𝑛-hop predicate be defined as 𝜌 = ⟨𝜌1 ,...,𝜌𝑛 ⟩ if 𝜎 −→ 𝜔1 −→ ... −→ 𝜔𝑛 ∈ 𝒦𝒢. For 𝜌 convenience, ℎ(𝜌) = 𝑛 for 𝜌 : 𝜎 −→ 𝜔𝑛 ∈ 𝒦𝒢 denotes the depth of the predicate chain, and when 𝜌 no confusion arises we will use 𝜎 −→ 𝜔 to denote a generic chain with ℎ(𝜌) ≥ 1. 2.1.1. Extraction of Item and User Features from Knowledge Graph Given a collection of items ℐ and a knowledge graph 𝒦𝒢, we assume each element in 𝑖 ∈ ℐ has a mapping to a corresponding entity in 𝒦𝒢. Under this assumption, an item 𝑖 can be explored, 1 http://dbpedia.org 2 https://github.com/sisinflab/LinkedDatasets 3 https://lod-cloud.net/datasets � (𝑛) at depth 𝑛, to identify the set ℱ𝑖 of the semantic features describing it: (𝑛) 𝜌 ℱ𝑖 = {⟨𝜌, 𝜔⟩ | 𝑖 − → 𝜔 ∈ 𝒦𝒢 ,ℎ(𝜌) ∈ {1,...,𝑛}}. (1) At this stage, feature filtering, graph pruning, and semantic feature selection techniques [9] can be applied to control the computational and memory load and improve system performance. We describe each user 𝑢 ∈ 𝒰 with the set ℱ𝑢 of the features representing the items ℐ𝑢 ⊆ ℐ enjoyed by 𝑢: ⋃︁ (𝑛) ℱ𝑢(𝑛) = ℱ𝑖 . (2) 𝑖∈ℐ𝑢 Finally, we define the overall set ℱ (𝑛) of features in the system: ⋃︁ (𝑛) ℱ (𝑛) = ℱ𝑖 . (3) 𝑖∈ℐ In the following, for convenience, the (𝑛) superscript is omitted whenever it is not relevant in the context. 2.1.2. Information Gain of User Features In information theory, entropy is used to measure the uncertainty of a random variable. In particular, the entropy 𝐻(𝑉 ) of a random variable 𝑉 with 𝑘 possible values in {𝑣1 ,...,𝑣𝑘 } is defined as: ∑︁𝑘 𝐻(𝑉 ) = − 𝑃 (𝑉 = 𝑣𝑖 )log2 𝑃 (𝑉 = 𝑣𝑖 ). (4) 𝑖=1 It is straightforward to check that a coin that always comes up heads has zero entropy, while a fair coin equally likely to come up heads or tails when flipped has entropy 1. Notably, if 𝐵𝑞 is a binary random variable that is true with probability 𝑞, we have 𝐻(𝐵𝑞 ) = −𝑞log2 𝑞−(1−𝑞)log2 (1−𝑞). For instance, given a dataset 𝒟 of training samples in the form (x,𝑦), with x ∈ R𝐹 and 𝑦 ∈ {0,1} is a binary variable that is true with probability 𝑞, its entropy is 𝐻(𝒟) = 𝐻(𝑦𝑞 ). As a consequence, a dataset with a balanced number of positive and negative samples has entropy 1. In this context, the information gain 𝐼𝐺(𝒟,𝑥𝑑 ) measures the expected reduction in information entropy obtained from the observation of one of the attributes 𝑥𝑑 in x: 𝐼𝐺(𝒟,𝑥𝑑 ) = 𝐻(𝒟)−𝐻(𝒟|𝑥𝑑 ), (5) where 𝐻(𝒟|𝑥𝑑 ) is the expected entropy of 𝒟 conditioned on 𝑥𝑑 . If 𝑥𝑑 can assume 𝑘 distinct values 𝑥𝑑,1 ,...𝑥𝑑,𝑘 with a categorical probability distribution,the dataset 𝒟 is partitioned into 𝑘 mutually exclusive subsets and the following conditioned entropy is determined: 𝑘 ∑︁ 𝐻(𝒟|𝑥𝑑 ) = 𝑃 (𝑥𝑑 = 𝑥𝑑,𝑖 )𝐻(𝒟|𝑥𝑑 = 𝑥𝑑,𝑖 ). (6) 𝑖=1 Since the information gain defined in Eq. (5) returns a measure of the importance of a single attribute in distinguishing positive from negative examples in a dataset, we build, for each user 𝑢, �a balanced dataset 𝒟𝑢 with ⋃︀ all the consumed items from ℐ𝑢 and the same amount of negative items randomly picked up from 𝑣∈𝒰 ,𝑣̸=𝑢 ℐ𝑣 ∖ℐ𝑢 . Each sample in 𝒟𝑢 is provided with a set of binary variables corresponding to the features in ℱ𝑢 . Each variable indicates, for each item 𝑖 in 𝒟𝑢 , the presence (𝑓 = 1) or the absence (𝑓 = 0) of the corresponding feature in the set ℱ𝑖 . According to the definition, 𝐻(𝒟𝑢 ) = 1. Therefore, the information gain for each feature 𝑓 ∈ ℱ𝑢 can be computed using the dataset 𝒟𝑢 . Let 𝑝𝑢𝑓 be the number of positive samples in 𝒟𝑢 for which 𝑓 = 1, 𝑛𝑢𝑓 the number of negative samples for which the same feature is present, and 𝑡𝑢𝑓 = 𝑝𝑢𝑓 +𝑛𝑢𝑓 . Analogously, 𝑝𝑢¬𝑓 = |ℐ𝑢 |−𝑝𝑢𝑓 is the number of positive samples with 𝑓 = 0, 𝑛𝑢¬𝑓 = |ℐ𝑢 |−𝑛𝑢𝑓 is the number of negative samples with 𝑓 = 0, and 𝑡𝑢¬𝑓 = 𝑝𝑢¬𝑓 +𝑛𝑢¬𝑓 . Following Eqq. (5) and (6): 𝐼𝐺(𝒟𝑢 ,𝑓 ) = 1−𝐻(𝒟𝑢 |𝑓 = 1)−𝐻(𝒟𝑢 |𝑓 = 0), (7) (︂ )︂ 𝑡𝑢𝑓 𝑝𝑢𝑓 𝑝𝑢𝑓 𝑛𝑢𝑓 𝑛𝑢𝑓 𝐻(𝒟𝑢 |𝑓 = 1) = − log2 − log2 , (8) |𝒟𝑢 | 𝑡𝑢𝑓 𝑡𝑢𝑓 𝑡𝑢𝑓 𝑡𝑢𝑓 (︂ )︂ 𝑡𝑢¬𝑓 𝑝𝑢¬𝑓 𝑝𝑢¬𝑓 𝑛𝑢¬𝑓 𝑛𝑢¬𝑓 𝐻(𝒟𝑢 |𝑓 = 0) = − log2 − log2 . (9) |𝒟𝑢 | 𝑡𝑢¬𝑓 𝑡𝑢¬𝑓 𝑡𝑢¬𝑓 𝑡𝑢¬𝑓 We finally associate a weight 𝑘𝑢𝑓 = 𝐼𝐺(𝒟𝑢 ,𝑓 ) to each pair of user 𝑢 and feature 𝑓 to represent the influence of a feature —in the view of the user— in the prediction of user-item interactions. 2.2. Sparse Interaction of Feature Embeddings for Prediction KGFlex models the features in ℱ as collaboratively learned embeddings in a latent space. Since KGFlex promotes the idea of having user fine-tuned versions of the same model, we have both a global representation of the features in ℱ and a personal view, for each user 𝑢, of the fea- tures in ℱ𝑢 ⊆ ℱ. Notably, the model is structured into two distinct parts. On the one hand, KGFlex keeps a set 𝒢 of global trainable embeddings and biases shared among all the users, with 𝒢 = {(g𝑓 ∈ R𝐸 ,𝑏𝑓 ∈ R), ∀𝑓 ∈ ℱ}. On the other hand, each user in KGFlex also has his/her personal representation of the features he/she interacted with, i.e., the features in ℱ𝑢 . These embeddings are collected within the set 𝒫 𝑢 , defined as 𝒫 𝑢 = {p𝑢𝑓 ∈ R𝐸 , ∀𝑓 ∈ ℱ𝑢 }. Then, the inner product between the personal representation p𝑓𝑢 and the global representation g𝑓 , plus a bias value 𝑏𝑓 , estimates the affinity of user 𝑢 to feature 𝑓 . The sum of such affinities for all the features in ℱ𝑢𝑖 = ℱ𝑢 ∩ℱ𝑖 , weighted according to the pre-computed entropy-based coefficients, estimates the interaction 𝑥^ 𝑢𝑖 between user 𝑢 and item 𝑖: ∑︁ 𝑥 ^ 𝑢𝑖 = 𝑘𝑢𝑓 (p𝑢𝑓 g𝑓 +𝑏𝑓 ). (10) 𝑓 ∈ℱ𝑢𝑖 Eq. (10) encodes the strategy KGFlex exploits to handle thousands of model features. In fact, it takes advantage of user profile to involve only a small subset of them in the estimate of the user-item affinity. To learn the model parameters, KGFlex adopts Bayesian Personalized Ranking (BPR), the most common pair-wise Learning to Rank strategy, based on a maximum posterior estimator. Given∑︀a training set 𝒯 = {(𝑢,𝑖 ,𝑖 ) | 𝑖 ∈ ℐ𝑢 ∧ 𝑖 ∈ ℐ∖ℐ𝑢 ,∀𝑢 ∈ 𝒰}, BPR optimizes the loss+ + − + − 𝐿 = (𝑢,𝑖+ ,𝑖− )∈𝒯 ln𝜎(𝑥 ^ 𝑢𝑖− ), with the assumption that a user 𝑢 prefers a consumed item 𝑖 ^ 𝑢𝑖+ −𝑥 �over a non-consumed item 𝑖− . The model parameters are learnt with an optimization algorithm like SGD, as described in Rendle et al. [10].⎧To that aim, we derive: ⎪ ⎪ ⎪ 𝑘𝑢𝑓 g𝑓 if 𝜃 = p𝑢𝑓 , 𝜕 ⎨𝑘 p𝑢 if 𝜃 = g , ⎪ 𝑢𝑓 𝑓 𝑓 𝑥 ^ 𝑢𝑖 = (11) 𝜕𝜃 ⎪ ⎪ ⎪ 𝑘𝑢𝑓 if 𝜃 = 𝑏𝑓 , else. ⎪ ⎩0 3. Experimental Setup In the following, we present a discussion about datasets, preprocessing, baselines, and evaluation protocol that guided the experimentation of KGFlex. 3.1. Datasets and Filtering The evaluation of the performance of KGFlex is conducted on two well-known datasets: Yahoo! Movies and Facebook Books. The datasets have been binarized, retaining ratings of 3 or higher. To ensure a fair comparison with the baselines, an iterative 10-core, and 5-core preprocessing procedure is performed on the first and second datasets. The items have been described with a set of semantic features retrieved through a KG exploration at depth 2 of the DBpedia 𝒦𝒢 using a public DBpedia URI mapping. Some features (based on their 1-hop predicate) have not been considered since they do not provide useful information [9]: dbo:wikiPageWikiLink, owl:sameAs, rdf:type, gold:hypernym, rdfs:seeAlso, dbp:wordnet_type, dbo:wikiPageExternalLink, dbo:thumbnail, prov:wasDerivedFrom, and dbp:wikiPageUsesTemplate. Therefore, we removed the features as- sociated with less than ten items. Finally, we kept the user’s 100 most informative features from the 1- and 2- hop exploration to reduce the computational costs. 3.2. Baselines, Evaluation Protocol and Metrics To assess the effectiveness of KGFlex, we compare it with BPR-MF [10], a latent factor model based on the same pair-wise optimization criterion used in KGFlex, a batch version of Rendle et al. [11] MF, NeuMF [12], and kaHFM [5], a factorization-based model making use of knowledge graphs. We have chosen the all unrated items protocol using the hold-out 80-20 splitting strategy, which considers as candidate items all the items not rated by the user. All the models have been tested in 10 different configurations of hyperparameters with the Bayesian hyperparameter optimization search. For the sake of reproducibility, we provide our code and a working configuration file for the Elliot framework [13, 14]. Our goal is to assess the accuracy and beyond-accuracy performance of KGFlex, along with its fairness properties with respect to the popularity of the items. In detail, we have measured the recommendation accuracy with nDCG [11], also used as the validation metric. Then, we have evaluated the diversity, adopting Item Coverage (IC) [15] and Gini Index (Gini) [16]. Finally, three bias metrics have been used to evaluate how the algorithms consider the items from the long-tail: ACLT [17], PopREO and PopRSP, specific applications of RSP, and REO [18]. PopREO estimates the equal opportunity of items, encouraging the True Positive Rate of popular and unpopular items to be the same. PopRSP measures statistical parity, assessing whether the ranking probability distributions for popular and unpopular items are the same in the recommendation. �Table 1 Comparison of KGFlex with baselines (in boldface) and other reference algorithms. Among the baselines, the best result is in boldface, the second-best result is underlined. For all the metrics, the cutoff is 10. a) Yahoo! Movies nDCG IC Gini ACLT PopREO PopRSP Random 0.00960 1050 0.84956 5.52026 0.09847 0.00980 Most Popular 0.15850 49 0.01263 0.00000 1.00000 1.00000 VSM 0.04777 370 0.05245 3.11768 0.49588 0.45751 Item-kNN 0.30739 745 0.15826 0.98842 0.70585 0.83466 MultiVAE 0.23696 399 0.09136 0.23473 0.85433 0.96127 BPR-MF 0.18571 151 0.02191 0.00064 0.99543 0.99989 MF 0.28971 455 0.09024 0.08232 0.87345 0.98645 NeuMF 0.09184 50 0.01134 0.00064 1.00000 0.99989 kaHFM 0.30055 757 0.16591 0.46238 0.76103 0.92339 KGFlex 0.24640 851 0.28015 2.14469 0.44768 0.63355 b) Facebook Books nDCG IC Gini ACLT PopREO PopRSP Random 0.00690 782 0.86167 5.26045 0.09794 0.00749 Most Popular 0.09393 16 0.01265 0.00000 1.00000 1.00000 VSM 0.03617 523 0.18874 3.80558 0.22616 0.29958 Item-kNN 0.12903 769 0.37520 2.22524 0.48852 0.59861 MultiVAE 0.11914 620 0.18279 0.46368 0.77695 0.91818 BPR-MF 0.09473 17 0.01318 0.00000 1.00000 1.00000 MF 0.09557 87 0.02376 0.00000 1.00000 1.00000 NeuMF 0.07142 17 0.01245 0.00000 1.00000 1.00000 kaHFM 0.12667 540 0.13866 0.32942 0.87663 0.94197 KGFlex 0.08526 606 0.30703 3.02641 0.15210 0.44852 3.3. Results 3.3.1. Evaluating the Overall Performance Table 1a depicts the evaluation outcome for the aforementioned metrics with a cutoff of 10 on the Yahoo! Movies dataset. KGFlex shows satisfactory accuracy results, being outperformed only by kaHFM and MF. It is noteworthy that KGFlex significantly outperforms BPR-MF, although both are learned with a pair-wise BPR optimization, thus underlining the beneficial role of the extracted knowledge. Moreover, examining the item coverage and Gini values, we note the high degree of personalization provided by KGFlex. We link this result to the personalized view of the knowledge granted by the framework. Moreover, in KGFlex the collaborative signal on explicit user interests ensures to recommend diverse items among the ones sharing characteristics of interest for the user. The aforementioned behavior is not confirmed in Facebook Books (see Table 1b). Indeed, the accuracy results seem to remain below the performance of other factorization-based approaches. However, the diversity results show how BPR-MF, MF, and NeuMF may have been flooded by popularity signal, which led them to perform poorly regarding the item coverage and Gini metrics. Instead, KGFlex does not suffer from this problem and approaches the superior performance of Item-kNN in terms of diversity. � a) Yahoo! Movies b) Facebook Books 0.4 0.6 0.3 HR@10 HR@10 0.4 0.2 0.2 0.1 0 0 4 6 8 10 4 6 8 10 SE@10 SE@10 Random Most Popular VSM Item-kNN MultiVAE BPR-MF MF NeuMF kaHFM KGFlex Figure 1: Accuracy vs. distributional diversity. The plots show the value of HR@10 against SE@10: the closer to the top-right corner the better. 3.3.2. Investigating the Accuracy/Personalization Trade-Off What we have analytically observed is substantially confirmed in Figure 1. These graphs show the joint behavior of KGFlex on accuracy and distributional diversity by analyzing the value of Hit Ratio (HR) with respect to the Shannon Entropy (SE) statistics, which measures how di- versely distributed are the recommended items. Among factorization-based approaches, KGFlex approaches the right-top margin to a greater extent, remarking its capability of providing highly personalized recommendations, probably due to the joint operation of the global and the personal views of the same features. The kaHFM model usually is the second-best model, while the other approaches seem to perform very poorly in at least one dimension or do not have a stable position when varying the dataset. 3.3.3. Analyzing the Algorithmic and Popularity Bias Oftentimes, unpopular items are not recommended and remain underrepresented [19], thus causing a fairness issue for items and an inappropriate recommendation for users who do not prefer very popular items. While there is a wide range of approaches in literature aiming to reduce the recommendation biases [20, 21, 22, 23, 24, 25, 26] , we study whether KGFlex is inherently resilient to algorithmic bias. From Tables 1a and 1b, it is noteworthy that KGFlex always outperforms all the other factorization-based approaches and generally outperforms the other approaches. The value of ACLT (the higher the better) is comparable with the value obtained by VSM. This result is further supported by the values of PopREO and PopRSP (the smaller the better). Concerning those metrics, KGFlex and VSM continue to grant the less biased recommendations. Interestingly, while both exploit the same optimization criterion, we notice how KGFlex consistently improves BPR-MF, which is known to be vulnerable to imbalanced data and to produce biased recommendations [18]. 4. Conclusion We introduced KGFlex, a promising knowledge-aware approach to generate recommendations from implicit feedback. KGFlex has demonstrated its ability to take the best from content-based and factorization-based recommendation approaches. Moreover, thanks to the high degree of expressiveness provided by the personalized representation of content information, KGFlex guarantees satisfactory and diverse recommendations and resilience to algorithmic bias. �References [1] Y. Koren, R. M. Bell, C. Volinsky, Matrix factorization techniques for recommender systems, IEEE Computer 42 (2009) 30–37. [2] S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: A survey and new perspectives, ACM Comput. Surv. 52 (2019) 5:1–5:38. URL: https://doi.org/10.1145/3285029. doi:10.1145/3285029. [3] Y. Zhu, Z. Guan, S. Tan, H. Liu, D. Cai, X. He, Heterogeneous hypergraph embedding for document recommendation, Neurocomputing 216 (2016) 150–162. [4] W. X. Zhao, S. Li, Y. He, L. Wang, J. Wen, X. Li, Exploring demographic information in social media for product recommendation, Knowl. Inf. Syst. 49 (2016) 61–89. [5] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, How to make latent factors interpretable by feeding factorization machines with knowledge graphs, in: ISWC (1), volume 11778 of Lecture Notes in Computer Science, Springer, 2019, pp. 38–56. [6] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ferrara, A. C. M. Mancino, Sparse feature factoriza- tion for recommender systems with knowledge graphs, in: RecSys, ACM, 2021, pp. 154–165. [7] T. Berners-Lee, J. Hendler, O. Lassila, The semantic web, Scientific Amer- ican 284 (2001) 34–43. URL: http://www.sciam.com/article.cfm?articleID= 00048144-10D2-1C70-84A9809EC588EF21. [8] T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Synthesis Lectures on the Semantic Web, Morgan & Claypool Publishers, 2011. URL: https://doi.org/ 10.2200/S00334ED1V01Y201102WBE001. doi:10.2200/S00334ED1V01Y201102WBE001. [9] T. D. Noia, C. Magarelli, A. Maurino, M. Palmonari, A. Rula, Using ontology-based data summarization to develop semantics-aware recommender systems, in: ESWC, volume 10843 of Lecture Notes in Computer Science, Springer, 2018, pp. 128–144. [10] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: bayesian personalized ranking from implicit feedback, in: UAI, AUAI Press, 2009, pp. 452–461. [11] S. Rendle, W. Krichene, L. Zhang, J. R. Anderson, Neural collaborative filtering vs. matrix factorization revisited, in: RecSys, ACM, 2020, pp. 240–248. [12] X. He, T. Chua, Neural factorization machines for sparse predictive analytics, in: SIGIR, ACM, 2017, pp. 355–364. [13] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D. Noia, Elliot: A comprehensive and rigorous framework for reproducible recommender systems evaluation, in: SIGIR, ACM, 2021, pp. 2405–2414. [14] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D. Noia, V-elliot: Design, evaluate and tune visual recommender systems, in: RecSys, ACM, 2021, pp. 768–771. [15] G. Adomavicius, Y. Kwon, Improving aggregate recommendation diversity using ranking-based techniques, IEEE Trans. Knowl. Data Eng. 24 (2012) 896–911. [16] P. Castells, N. J. Hurley, S. Vargas, Novelty and diversity in recommender systems, in: F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook, Springer, 2015, pp. 881–918. URL: https://doi.org/10.1007/978-1-4899-7637-6_26. [17] H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender systems with personalized re-ranking, in: FLAIRS Conference, AAAI Press, 2019, pp. 413–418. �[18] Z. Zhu, J. Wang, J. Caverlee, Measuring and mitigating item under-recommendation bias in personalized ranking systems, in: SIGIR, ACM, 2020, pp. 449–458. [19] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity bias in recommendation, in: RMSE@RecSys, volume 2440 of CEUR Workshop Proceedings, CEUR-WS.org, 2019. [20] C. Zhou, J. Ma, J. Zhang, J. Zhou, H. Yang, Contrastive learning for debiased candidate generation in large-scale recommender systems, in: KDD, ACM, 2021, pp. 3985–3995. [21] R. Borges, K. Stefanidis, On mitigating popularity bias in recommendations via variational autoencoders, in: SAC, ACM, 2021, pp. 1383–1389. [22] T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, T. Joachims, Recommendations as treatments: Debiasing learning and evaluation, in: ICML, volume 48 of JMLR Workshop and Conference Proceedings, JMLR.org, 2016, pp. 1670–1679. [23] D. Jannach, L. Lerche, I. Kamehkhosh, M. Jugovac, What recommenders recommend: an analysis of recommendation biases and possible countermeasures, User Model. User Adapt. Interact. 25 (2015) 427–491. [24] M. D. Ekstrand, M. Tian, I. M. Azpiazu, J. D. Ekstrand, O. Anuyah, D. McNeill, M. S. Pera, All the cool kids, how do they fit in?: Popularity and demographic biases in recommender evaluation and effectiveness, in: FAT, volume 81 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 172–186. [25] L. Boratto, G. Fenu, M. Marras, Connecting user and item perspectives in popularity debiasing for collaborative recommendation, Inf. Process. Manag. 58 (2021) 102387. [26] Y. Zhang, F. Feng, X. He, T. Wei, C. Song, G. Ling, Y. Zhang, Causal intervention for leveraging popularity bias in recommendation, in: SIGIR, ACM, 2021, pp. 11–20. �