Difference between revisions of "Vol-3194/paper67"

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search
(edited by wikiedit)
 
(edited by wikiedit)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
+
=Paper=
 
{{Paper
 
{{Paper
 +
|id=Vol-3194/paper67
 +
|storemode=property
 +
|title=Accounting for Bossy Users in Context-Aware Group Recommendations
 +
|pdfUrl=https://ceur-ws.org/Vol-3194/paper67.pdf
 +
|volume=Vol-3194
 +
|authors=Davide Azzalini,Elisa Quintarelli,Emanuele Rabosio,Letizia Tanca
 +
|dblpUrl=https://dblp.org/rec/conf/sebd/AzzaliniQRT22
 
|wikidataid=Q117344900
 
|wikidataid=Q117344900
 
}}
 
}}
 +
==Accounting for Bossy Users in Context-Aware Group Recommendations==
 +
<pdf width="1500px">https://ceur-ws.org/Vol-3194/paper67.pdf</pdf>
 +
<pre>
 +
Accounting for Bossy Users in Context-Aware Group
 +
Recommendations
 +
(Discussion Paper)
 +
 +
Davide Azzalini1 , Elisa Quintarelli2 , Emanuele Rabosio1 and Letizia Tanca1
 +
1
 +
    Politecnico di Milano, Milan, Italy
 +
2
 +
    University of Verona, Verona, Italy
 +
 +
 +
                                        Abstract
 +
                                        Lots of activities, like watching a movie or going to the restaurant, are intrinsically group-based. To rec-
 +
                                        ommend such activities to groups, traditional single-user recommendation techniques are not appropriate
 +
                                        and, as a consequence, over the years a number of group recommender systems have been developed.
 +
                                        Recommending items to be enjoyed together by a group of people poses many ethical challenges: in
 +
                                        fact, a system whose unique objective is to achieve the best recommendation accuracy might learn to
 +
                                        disadvantage submissive users in favor of more aggressive ones. In this work we investigate the ethical
 +
                                        challenges of context-aware group recommendations, in the general case of ephemeral groups (i.e.,
 +
                                        groups where the members might be together for the first time), using a method that can recommend
 +
                                        also items that are new to the system. We show the goodness of our method on two real-world datasets.
 +
                                        The first one is a very large dataset containing the personal and group choices regarding TV programs of
 +
                                        7,921 users w.r.t. sixteen contexts of viewing, while the second one gathers the musical preferences (both
 +
                                        individual and in groups) of 280 real users w.r.t. two contexts of listening. Our extensive experiments
 +
                                        show that our method always manages to obtain the highest recall while delivering ethical guarantees in
 +
                                        line with the other fair group recommender systems tested.
 +
 +
                                        Keywords
 +
                                        group recommender systems, context-aware recommender systems, computer ethics, fairness
 +
 +
 +
 +
 +
1. Introduction
 +
Recommender Systems are software tools and techniques that provide suggestions for items to
 +
be of use to a user [1]. Several everyday activities are intrinsically group-based, thus recent
 +
research concentrates also on systems that suggest activities that can be performed together
 +
with other people and are typically social. The group recommendation problem introduces
 +
further challenges with respect to the traditional single-user recommendations: (i) the group
 +
members may have different preferences, and finding items that meet the tastes of everyone may
 +
be impossible; (ii) a group may be formed by people who happen to be together for the first time,
 +
and, in this case, not being any history of the group’s preferences available, the recommendation
 +
can only be computed on the basis of those known for the group members combined by means of
 +
some aggregation function; (iii) last but not least, people, when in a group, may exhibit different
 +
 +
SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
 +
$ davide.azzalini@polimi.it (D. Azzalini); elisa.quintarelli@univr.it (E. Quintarelli); emanuele.rabosio@polimi.it
 +
(E. Rabosio); letizia.tanca@polimi.it (L. Tanca)
 +
                                      © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 +
    CEUR
 +
    Workshop
 +
    Proceedings
 +
                  http://ceur-ws.org
 +
                  ISSN 1613-0073
 +
                                      CEUR Workshop Proceedings (CEUR-WS.org)
 +
�behaviors with respect to when they are alone, and therefore their individual preferences
 +
sometimes might not be a reliable source of information. This last observation introduces an
 +
unfairness problem: if the recommender system learns to consider the preferences of some
 +
users as more relevant than those of the others, the overall satisfaction of the users belonging
 +
to a group may not be optimal. This unbalance in the negotiation power that the system learns
 +
to assign to different users, with the purpose of obtaining the best possible recommendation
 +
accuracy, may be the result of unfair dynamics, e.g. some users being more aggressive and
 +
some others not feeling confident enough to stand up for themselves. In this work we extend
 +
a state-of-the-art system for context-aware recommendations to ephemeral groups based on
 +
the concept of contextual influence [2, 3] to account also for fairness. Experiments on two
 +
real-world datasets show that our approach outperforms seven other fair group recommender
 +
systems by achieving a consistently better recall while providing similar ethical guarantees.
 +
 +
 +
2. Related Work
 +
Context-aware Recommender Systems
 +
The majority of the existing approaches to Recommender Systems do not take into considera-
 +
tion any contextual information, however, in many applications, the context of use might be
 +
fundamental in guiding the current preference of a user [4]. Recent studies have shown that
 +
Context-Aware Recommender Systems can generate a very high increase in performance [5].
 +
 +
Group Recommender Systems
 +
Group Recommender Systems are systems that produce a common recommendation for a group
 +
of users [6]. Group recommendations works usually address two kinds of groups: persistent
 +
and ephemeral [7]. Persistent groups contain people that have a previous significant history of
 +
activities together, while ephemeral groups are formed by people who happen to be together for
 +
the first time. In the case of persistent groups, classical recommendation techniques can be used,
 +
since the group can be considered as a single user, whereas, in the case of ephemeral groups,
 +
recommendations must be computed on the basis of those known for the members of the group.
 +
A number of different aggregation strategies for the individual preferences have been proposed
 +
over the years [6], however most of these aggregation strategies clearly violate the fairness
 +
principles. For instance, maximum satisfaction, used in [8, 9, 10, 11, 12], chooses the item for
 +
which the individual preference score is the highest, effectively ignoring the satisfaction of most
 +
of the users in the group. Other clear examples of unfair aggregation strategies are works such
 +
as [13, 14, 15], which assign a different power to group members based on their expertise.
 +
 +
Fairness in Recommender Systems
 +
In single-user Recommender Systems, fairness is usually assessed with regard to sensitive
 +
attributes which are generally prone to discrimination (e.g., gender, ethnicity or social belonging)
 +
by verifying the presence of a discriminated class within the user set [16, 17]. When fairness
 +
is evaluated considering Group Recommender Systems, it should be computed within groups.
 +
Since the groups we consider in this work are composed of few users, evaluating fairness in the
 +
way just described is not a suitable solution. Instead of detecting unfairness towards a protected
 +
�group of users, we aim to detect and prevent unfairness towards single users within a group
 +
whose desires are not considered when forming a recommendation for the whole group.
 +
 +
Fairness in Group Recommender Systems
 +
Some aggregation strategies exist that, despite not having been developed to explicitly address
 +
ethical issues, aggregate individual preferences in a way that resembles fairness. Least misery,
 +
used in [7, 8, 18, 19, 9, 20, 10, 11, 12, 21], chooses the items for which the lowest value among
 +
the preferences of the group members is the greatest one. The authors in [22] introduce an
 +
aggregation function which maximizes the satisfaction of group components, while, at the same
 +
time, minimizes the disagreement among them. Average, used in [8, 23, 18, 19, 9, 10, 13, 11, 12, 21],
 +
computes the group preference for an item as the arithmetic mean of the individual scores. Lastly,
 +
some recent works try to explicitly target the aim of producing fair group recommendations. In
 +
[24] the preferences of individual users are combined with a measure of fairness, to guarantee
 +
that all the users are somehow satisfied. In [25, 26] two aggregation strategies are proposed: one
 +
is based on the idea of proportionality, while the other one is based on the idea envy-freeness.
 +
In [27] a greedy algorithm to achieve rank-sensitive balance is presented.
 +
 +
 +
3. The proposed method
 +
In this section we review a previous approach of ours, introduced in [2, 3], CtxInfl. Then, our
 +
contribution to make CtxInfl more fair will be presented. The resulting method is named FARGO.
 +
 +
3.1. CtxInfl
 +
We considere a set of items 𝐼 and a set of users 𝑈 , from which any group 𝐺 ∈ ℘(𝑈 ) can
 +
be extracted. 𝐶 is the set of possible contexts in the given scenario, where a context 𝑐 is the
 +
conjunction of a set of dimension/value pairs: e.g., for the TV dataset, a context might be
 +
𝑐 = ⟨𝑡𝑖𝑚𝑒_𝑠𝑙𝑜𝑡 = 𝑝𝑟𝑖𝑚𝑒𝑡𝑖𝑚𝑒 ∧ 𝑑𝑎𝑦 = 𝑤𝑒𝑒𝑘𝑒𝑛𝑑⟩. We assume the availability of a log ℒ
 +
recording the history of the items previously chosen by groups formed in the past, where each
 +
element of ℒ is a 4-ple (𝑡𝑗 , 𝑐𝑗 , 𝐺𝑗 , 𝑖𝑗 ), 𝑡𝑗 being the time instant in which the item 𝑖𝑗 ∈ 𝐼 has
 +
been chosen by the group 𝐺𝑗 ∈ ℘(𝑈 ) in the context 𝑐𝑗 ∈ 𝐶. A contextual scoring function
 +
𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐), with 𝑢 ∈ 𝑈 , 𝑖 ∈ 𝐼, 𝑐 ∈ 𝐶, assigning to each user the score given to the items in
 +
the various contexts, is computed offline on the basis of the log of the past individual choices
 +
and of the item descriptions in terms of their attributes, using any context-aware recommender
 +
system for single users from the literature. 𝑇 𝑜𝑝𝐾(𝑢, 𝑐, 𝑡) is the function that returns the list
 +
of the 𝐾 items preferred by user 𝑢 in context 𝑐, according to the values of 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐), for
 +
each 𝑖 ∈ 𝐼 available at instant 𝑡. Given a target group 𝐺 ∈ ℘(𝑈 ), a context 𝑐 ∈ 𝐶 and a time
 +
instant 𝑡, the group recommendation is obtained by recommending to the users in 𝐺 a list (i.e.,
 +
an ordered set) of 𝐾 items, considered interesting in context 𝑐, from those items in 𝐼 that are
 +
available at time instant 𝑡, according to the following procedure:
 +
�3.1.1. Influence computation
 +
The group preference for an item is obtained by aggregating the individual preferences of the
 +
group members on the basis of their influence. In each context 𝑐, the influence 𝑖𝑛𝑓 𝑙(𝑢, 𝑐) of a
 +
given user 𝑢 is derived offline by comparing the behavior of 𝑢 when alone (i.e., 𝑢’s individual
 +
preferences) with 𝑢’s behaviors in groups (i.e., the interactions contained in the log ℒ). Basically,
 +
the influence of 𝑢 tells us how many times the groups containing 𝑢 have selected one of 𝑢’s
 +
favorite items. Let 𝑇 𝑜𝑝𝐾(𝑢, 𝑐, 𝑡) be the list of the 𝐾 items preferred by user 𝑢 in context 𝑐,
 +
according to the values of 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐) for each 𝑖 ∈ 𝐼 available at instant 𝑡. The contextual
 +
influence is defined as follows:
 +
                                |𝑙𝑗 ∈ ℒ : 𝑐 = 𝑐𝑗 ∧ 𝑢 ∈ 𝐺𝑗 ∧ 𝑖𝑗 ∈ 𝑇 𝑜𝑝𝐾(𝑢, 𝑐, 𝑡𝑗 )|
 +
                𝑖𝑛𝑓 𝑙(𝑢, 𝑐) =                                                                    (1)
 +
                                            |𝑙𝑗 ∈ ℒ : 𝑐 = 𝑐𝑗 ∧ 𝑢 ∈ 𝐺𝑗 |
 +
 +
The value of 𝑖𝑛𝑓 𝑙(𝑢, 𝑐) quantifies the ability of user 𝑢 to direct the group’s decision towards
 +
𝑢’s own tastes while in context 𝑐.
 +
 +
3.1.2. Top-K Group Recommendation Computation
 +
Top-𝐾 recommendations are computed online, when a group of users requires that the system
 +
suggests some interesting items to be enjoyed together. The system must compute the group
 +
preferences for the items, and then determine the 𝐾 items with the highest scores. Given a
 +
group 𝐺 ∈ ℘(𝑈 ), its preference 𝑠𝑐𝑜𝑟𝑒(𝐺, 𝑖, 𝑐) for 𝑖 ∈ 𝐼 in the context 𝑐 ∈ 𝐶 is computed as
 +
the average of the preferences of its members weighed on the basis of each member’s influence
 +
(Eq. 1) in context 𝑐:
 +
                                        ∑︀
 +
                                              𝑖𝑛𝑓 𝑙(𝑢, 𝑐) · 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐)
 +
                      𝑠𝑐𝑜𝑟𝑒(𝐺, 𝑖, 𝑐) = 𝑢∈𝐺 ∑︀                                            (2)
 +
                                                𝑢∈𝐺 𝑖𝑛𝑓 𝑙(𝑢, 𝑐)
 +
 +
Then, the top-𝐾 list of items preferred by a certain group 𝐺 in context 𝑐 at time instant 𝑡 is
 +
determined by retrieving the 𝐾 items with the highest scores among those available at time 𝑡.
 +
 +
3.2. FARGO
 +
Being CtxInfl based on the concept of influence, it inevitably privileges the preferences of the
 +
most influential users. As a consequence, the results of the recommendation process are biased
 +
towards the preference of one user or few users of the group who can be considered as the
 +
leaders, or, using a more contemporary word, “influencers". Our aim is to add an element of
 +
fairness to CtxInfl while maintaining its general structure, which already proved to be very
 +
efficient and scalable [3]. Among the various phases of CtxInfl on which we could act (i.e.,
 +
individual preferences computation, influence computation, and Top-K group recommendations
 +
computation), the last is the most suitable one, as it is the only one acting on groups. Following
 +
this intuition, we propose to add a fairness factor to the computation of the score for each item
 +
(Eq. 2), in order to modify the order of the items in the Top-𝐾 list produced in such a way that
 +
items representing unfair recommendations will not appear on top. This is further motivated by
 +
the fact that, when people make decisions in groups, not necessarily they follow the decision
 +
of a leader (as assumed by CtxInfl): in some cases people may take decisions trying to satisfy
 +
�every group member as much as possible. This means that considering only influence may not
 +
be a complete strategy even if we put aside our ethical concerns. In order not to increase the
 +
complexity of the computation of Eq. 2, we build our fairness element using just the individual
 +
contextual scores, which are already used to compute Eq. 2. We call consensus the metric that
 +
quantifies how much the individual preferences of group members agree on the evaluation of
 +
an item. The consensus of a group 𝐺 on an item 𝑖 in a context 𝑐 is therefore defined as:
 +
                                          ∑︀      (︀                                  )︀2
 +
                                            𝑢∈𝐺        𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐) − 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐)
 +
              consensus(𝐺, 𝑖, 𝑐) = 1 −                                                  ,    (3)
 +
                                                                |𝐺|
 +
where 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐) is the average score for item 𝑖 among 𝐺’s members in context 𝑐. The
 +
consensus for an item for which users gave a similar evaluation will be close to 1, while it will
 +
reach its minimum when very discordant scores are considered. According to the formula of
 +
the maximum variance, 𝑐𝑜𝑛𝑠𝑒𝑛𝑠𝑢𝑠 ∈ [0.75, 1]. After having defined 𝑐𝑜𝑛𝑠𝑒𝑛𝑠𝑢𝑠, we propose
 +
to integrate it in Eq. 2 in the following way:
 +
                                  ∑︀
 +
                                        𝑖𝑛𝑓 𝑙(𝑢, 𝑐) · 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐)
 +
        𝑓 𝑎𝑖𝑟_𝑠𝑐𝑜𝑟𝑒(𝐺, 𝑖, 𝑐) = 𝑢∈𝐺 ∑︀                                · consensus(𝐺, 𝑖, 𝑐)|𝐺| (4)
 +
                                            𝑢∈𝐺  𝑖𝑛𝑓  𝑙(𝑢, 𝑐)
 +
 +
We exponentiate consensus to the group size (with the effect of further reducing the overall
 +
score) according to the intuition that the magnitude of unfairness in group recommendations
 +
is proportional to the group size. In fact, the bigger the group, the bigger the potential harm
 +
produced by taking into consideration solely the leader/influencer’s will is.
 +
 +
 +
4. Experimental Results
 +
In this section we present the results obtained by applying the proposed approach to two
 +
different real-world datasets. To evaluate the recommendation performance we use recall,
 +
considering for 𝐾 (number of items to be recommended) the values 1, 2 and 3. To evaluate the
 +
ethical properties of our method we used the two metrics proposed in [28] for estimating user
 +
discrimination, called score disparity and recommendation disparity, adapted to our needs. Score
 +
disparity is computed as the Gini coefficient of user satisfaction, i.e., the relative gain achieved
 +
by the user due to the actual recommendation with respect to the optimal recommendation
 +
strategy from the user perspective. Recommendation disparity is computed as the Gini coefficient
 +
of user gains, i.e., how many of the recommended items match the user Top-K items.
 +
  We compare our approach to the following methods: average (AVG) [8, 23, 18, 19, 9, 10, 13,
 +
11, 12, 21], Fair Lin [24], Fair Prop [25, 26], Envy Free [25, 26], minimum disagreement (Dis)
 +
[22], least misery (LM) [7, 8, 18, 19, 9, 20, 10, 11, 12, 21] and GFAR [27].
 +
 +
4.1. TV Dataset
 +
This dataset contains TV viewing information related to 7,921 users and 119 channels, broad-
 +
casted both over the air and by satellite. The dataset is composed of an Electronic Program
 +
Guide (EPG) containing the description of 21,194 distinct programs, and a log containing both
 +
individual and group viewings performed by the users. The log spans from December 2012 to
 +
�                              K=1                            K=2                        K=3
 +
                  Recall      DS      DR        Recall    DS        DR    Recall    DS      DR
 +
  FARGO          37.94%      7.61%  17.85%      54.08%    1.85%    12.69%  64.20%  0.89%  10.08%
 +
    AVG          33.914%    7.07%  18.15%      51.56%    2.93%    8.78%  62.91%  1.36%  7.53%
 +
  Fair Lin        33.22%    8.83%  18.25%      50.80%    3.59%    7.46%  61.21%  1.61%  7.01%
 +
  Fair Prop        32.99%    8.83%  13.45%      50.55%    4.25%    8.90%  62.03%  1.79%  7.70%
 +
  Envy Free        29.33%    10.43%  13.81%      47.37%    4.23%    10.87%  58.67%  1.89%  8.72%
 +
    Dis          33.57%    6.67%  17.45%      51.95%    2.76%    8.97%  63.26%  1.30%  7.61%
 +
    LM            30.35%    5.69%    12.42%      47.10%    2.58%    10.11%  58.27%  1.25%  8.18%
 +
    GFAR          30.47%        -    18.28%      44.48%      -    5.59%    55.19%      -    7.41%
 +
 +
Table 1
 +
Comparison with other fair methods on TV dataset
 +
 +
 +
February 2013 and contains 4,968,231 entries, among which we retained just the syntonizations
 +
longer than three minutes. 3,519,167 viewings were performed by individual users, and are
 +
used to compute the individual preferences of the group members. The remaining 1,449,064
 +
viewings have been done by more than one person. The two context dimensions considered are
 +
day of the week (weekday vs. weekend) and the time slot. The available values for the time slot
 +
are: graveyard slot, early morning, morning, daytime, early fringe, prime access, primetime,
 +
and late fringe. Group viewings are split into a training set (1,210,316 entries), and a test set
 +
(238,748 entries) with a 80%-20% ratio. Results are reported in Table 1. Note that the superiority
 +
of our method, recall-wise, is very pronounced. For what regards the ethical guarantees, FARGO,
 +
delivers a very good score disparity, while, for what regards the recommendation disparity, it
 +
seems to perform generally worse than the other methods (except for 𝑘 = 1, for which its
 +
performance is on par with the other methods). Note that for GFAR it is not possible to compute
 +
the score disparity as it does not involve the computation of group scores for the items.
 +
 +
4.2. Music Dataset
 +
This dataset1 has been created within the scope of a user study by asking participants to fill in
 +
two different forms: an individual form collecting demographic data (i.e., age and gender) and
 +
contextual individual preferences about music artists, and a group form to be filled in groups
 +
asking for a collective choice of a music artist that was available at the time of the choice in a
 +
particular context. The following two listening contexts have been selected, considering that
 +
both are common situations users can relate to both when alone and when with other people,
 +
and that users’ preferences would likely be different in each of them: during a car trip and at
 +
dinner as background music. The dataset obtained contains data gathered from 280 users. For
 +
each user, preferences regarding both the car trip and dinner contexts are gathered. From the
 +
group forms, 498 context-aware collective preferences have been gathered. Of this, 272 groups
 +
were composed of 2 users, 158 of 3 users, 32 of 4 users and 36 of 5 users. As for the previous
 +
dataset, we used a 80%-20% split for training and test sets. Results are reported in Table 2. Also
 +
in this case FARGO delivers the best recall. Contrarily to the previous dataset, in this case our
 +
method achieves a very good recommendation disparity. For what regards the score disparity, all
 +
methods provide very low (i.e., good) values.
 +
  1
 +
      The dataset can be downloaded at https://github.com/azzada/FARGO.
 +
�                        K=1                          K=2                          K=3
 +
                Recall    DS      DR        Recall    DS      DR        Recall    DS      DR
 +
    FARGO      25.00%  2.19%  1.62%      40.28%  0.87%  2.03%      49.31%    0.53%  2.40%
 +
    AVG        12.50%  0.81%  3.24%      25.00%  0.39%  2.91%      34.72%    0.25%  2.71%
 +
    Fair Lin    11.11%  2.19%  4.17%      23.61%  0.81%  2.14%      31.94%    0.48%  1.81%
 +
  Fair Prop    13.19%  0.66%  2.55%      20.83%  0.38%  3.24%      29.86%    0.37%  3.00%
 +
  Envy Free    12.50%  0.81%  3.24%      25.00%  0.39%  2.95%      34.72%    0.25%  2.71%
 +
      Dis      22.92%  0.74%  3.41%      34.72%  0.43%  2.83%      41.67%    0.32%  2.49%
 +
      LM        13.89%  1.14%  3.76%      25.00%  0.35%  3.13%      34.72%    0.28%  1.99%
 +
    GFAR        6.06%      -    8.73%      24.24%      -    1.88%      33.33%      -    6.24%
 +
Table 2
 +
Comparison with other fair methods on Music dataset
 +
 +
 +
5. Conclusions
 +
In this paper we have introduced FARGO, a new method for providing fair, context-aware
 +
recommendations to ephemeral groups able also to recommend items new in the system.
 +
Considering both recall and fairness, it is not possible to identify a best overall method across all
 +
datasets and values of 𝐾. Even if we ignored recall, a clear winner fairness-wise is not evident
 +
(all methods tested, except for Dis, perform best fairness-wise for at least a value of 𝐾 in at least
 +
one dataset). We argue that the relationship between fairness and recommendation accuracy
 +
should be seen as a tradeoff. On both datasets of our experiments, FARGO provides the best
 +
solution to such tradeoff by achieving the best recall across all values of 𝐾 while delivering
 +
similar ethical guarantees to the other fair methods tested. Contrarily to what one might think,
 +
LM is not the best method fairness-wise, and this implies that the problem of maximizing
 +
both recall and fairness is not a simple one. This is a complex problem that deserves further
 +
investigations, as recall and fairness seem not to be inversely correlated in a trivial manner.
 +
 +
 +
References
 +
[1] F. Ricci, L. Rolach, B. Shapira, P. B. Kantor, Recommender Systems Handbook, Springer,
 +
    2011.
 +
[2] E. Quintarelli, E. Rabosio, L. Tanca, Recommending new items to ephemeral groups using
 +
    contextual user influence, in: Proc. RecSys, 2016, pp. 285–292.
 +
[3] E. Quintarelli, E. Rabosio, L. Tanca, Efficiently using contextual influence to recommend
 +
    new items to ephemeral groups, Inf. Syst. 84 (2019) 197–213.
 +
[4] G. Adomavicius, A. Tuzhilin, Context-Aware Recommender Systems, Springer, 2011, pp.
 +
    217–253.
 +
[5] K. Verbert, N. Manouselis, X. Ochoa, M. Wolpers, H. Drachsler, I. Bosnic, E. Duval, Context-
 +
    aware recommender systems for learning: A survey and future challenges, IEEE Transac-
 +
    tions on Learning Technologies 5 (2012) 318–335.
 +
[6] J. Masthoff, Group Recommender Systems: Combining Individual Models, Springer, 2011,
 +
    pp. 677–702.
 +
[7] M. O’Connor, D. Cosley, J. A. Konstan, J. Riedl, Polylens: A recommender system for
 +
    groups of users, in: Proc. ECSCW, 2001, pp. 199–218.
 +
� [8] J. Masthoff, Group modeling: Selecting a sequence of television items to suit a group of
 +
    viewers, in: Personalized Digital Television, Springer, 2004, pp. 93–141.
 +
[9] E. Ntoutsi, K. Stefanidis, K. Nørvåg, H.-P. Kriegel, Fast group recommendations by applying
 +
    user clustering, in: Proc. ER, 2012, pp. 126–140.
 +
[10] A. J. Chaney, M. Gartrell, J. M. Hofman, J. Guiver, N. Koenigstein, P. Kohli, U. Paquet, A
 +
    large-scale exploration of group viewing patterns, in: Proc. TVX, 2014, pp. 31–38.
 +
[11] T. De Pessemier, S. Dooms, L. Martens, Comparison of group recommendation algorithms,
 +
    Multimedia Tools Appl. 72 (2014) 2497–2541.
 +
[12] N.-r. Kim, J.-H. Lee, Group recommendation system: Focusing on home group user in tv
 +
    domain, in: Proc. SCIS, 2014, pp. 985–988.
 +
[13] I. Ali, S.-W. Kim, Group recommendations: approaches and evaluation, in: Proc. IMCOM,
 +
    2015, pp. 1–6.
 +
[14] M. Gartrell, X. Xing, Q. Lv, A. Beach, R. Han, S. Mishra, K. Seada, Enhancing group
 +
    recommendation by incorporating social relationship interactions, in: Proc. GROUP, 2010,
 +
    pp. 97–106.
 +
[15] S. Berkovsky, J. Freyne, Group-based recipe recommendations: analysis of data aggregation
 +
    strategies, in: Proc. RecSys, 2010, pp. 111–118.
 +
[16] S. Yao, B. Huang, New fairness metrics for recommendation that embrace differences,
 +
    CoRR abs/1706.09838 (2017).
 +
[17] Y. Li, Y. Ge, Y. Zhang, Tutorial on fairness of machine learning in recommender systems,
 +
    in: Proc. SIGIR, 2021, pp. 2654–2657.
 +
[18] L. Baltrunas, T. Makcinskas, F. Ricci, Group recommendations with rank aggregation and
 +
    collaborative filtering, in: Proc. RecSys, 2010, pp. 119–126.
 +
[19] C. Senot, D. Kostadinov, M. Bouzid, J. Picault, A. Aghasaryan, C. Bernier, Analysis of
 +
    strategies for building group profiles, in: Proc. UMAP, 2010, pp. 40–51.
 +
[20] J. Gorla, N. Lathia, S. Robertson, J. Wang, Probabilistic group recommendation via infor-
 +
    mation matching, in: Proc. WWW, 2013, pp. 495–504.
 +
[21] S. Ghazarian, M. A. Nematbakhsh, Enhancing memory-based collaborative filtering for
 +
    group recommender systems, Expert Syst. Appl. 42 (2015) 3801–3812.
 +
[22] S. Amer-Yahia, S. B. Roy, A. Chawlat, G. Das, C. Yu, Group recommendation: Semantics
 +
    and efficiency, in: Proc. VLDB, 2009, pp. 754–765.
 +
[23] Z. Yu, X. Zhou, Y. Hao, J. Gu, Tv program recommendation for multiple viewers based on
 +
    user profile merging, User Model. User-Adapt. Int. 16 (2006) 63–82.
 +
[24] L. Xiao, Z. Min, Z. Yongfeng, G. Zhaoquan, L. Yiqun, M. Shaoping, Fairness-aware group
 +
    recommendation with pareto-efficiency, in: Proc. RecSys, 2017, pp. 107–115.
 +
[25] S. Qi, N. Mamoulis, E. Pitoura, P. Tsaparas, Recommending packages to groups, in: Proc.
 +
    ICDM, 2016, pp. 449–458.
 +
[26] D. Serbos, S. Qi, N. Mamoulis, E. Pitoura, P. Tsaparas, Fairness in package-to-group
 +
    recommendations, in: Proc. WWW, 2017, pp. 371–379.
 +
[27] M. Kaya, D. Bridge, N. Tintarev, Ensuring fairness in group recommendations by rank-
 +
    sensitive balancing of relevance, in: Proc. RecSys, 2020, pp. 101–110.
 +
[28] J. Leonhardt, A. Anand, M. Khosla, User fairness in recommender systems, in: Proc.
 +
    WWW, 2018, pp. 101–102.
 +
 +
</pre>

Latest revision as of 17:54, 30 March 2023

Paper

Paper
edit
description  
id  Vol-3194/paper67
wikidataid  Q117344900→Q117344900
title  Accounting for Bossy Users in Context-Aware Group Recommendations
pdfUrl  https://ceur-ws.org/Vol-3194/paper67.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/AzzaliniQRT22
volume  Vol-3194→Vol-3194
session  →

Accounting for Bossy Users in Context-Aware Group Recommendations

load PDF

Accounting for Bossy Users in Context-Aware Group
Recommendations
(Discussion Paper)

Davide Azzalini1 , Elisa Quintarelli2 , Emanuele Rabosio1 and Letizia Tanca1
1
    Politecnico di Milano, Milan, Italy
2
    University of Verona, Verona, Italy


                                         Abstract
                                         Lots of activities, like watching a movie or going to the restaurant, are intrinsically group-based. To rec-
                                         ommend such activities to groups, traditional single-user recommendation techniques are not appropriate
                                         and, as a consequence, over the years a number of group recommender systems have been developed.
                                         Recommending items to be enjoyed together by a group of people poses many ethical challenges: in
                                         fact, a system whose unique objective is to achieve the best recommendation accuracy might learn to
                                         disadvantage submissive users in favor of more aggressive ones. In this work we investigate the ethical
                                         challenges of context-aware group recommendations, in the general case of ephemeral groups (i.e.,
                                         groups where the members might be together for the first time), using a method that can recommend
                                         also items that are new to the system. We show the goodness of our method on two real-world datasets.
                                         The first one is a very large dataset containing the personal and group choices regarding TV programs of
                                         7,921 users w.r.t. sixteen contexts of viewing, while the second one gathers the musical preferences (both
                                         individual and in groups) of 280 real users w.r.t. two contexts of listening. Our extensive experiments
                                         show that our method always manages to obtain the highest recall while delivering ethical guarantees in
                                         line with the other fair group recommender systems tested.

                                         Keywords
                                         group recommender systems, context-aware recommender systems, computer ethics, fairness




1. Introduction
Recommender Systems are software tools and techniques that provide suggestions for items to
be of use to a user [1]. Several everyday activities are intrinsically group-based, thus recent
research concentrates also on systems that suggest activities that can be performed together
with other people and are typically social. The group recommendation problem introduces
further challenges with respect to the traditional single-user recommendations: (i) the group
members may have different preferences, and finding items that meet the tastes of everyone may
be impossible; (ii) a group may be formed by people who happen to be together for the first time,
and, in this case, not being any history of the group’s preferences available, the recommendation
can only be computed on the basis of those known for the group members combined by means of
some aggregation function; (iii) last but not least, people, when in a group, may exhibit different

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
$ davide.azzalini@polimi.it (D. Azzalini); elisa.quintarelli@univr.it (E. Quintarelli); emanuele.rabosio@polimi.it
(E. Rabosio); letizia.tanca@polimi.it (L. Tanca)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
�behaviors with respect to when they are alone, and therefore their individual preferences
sometimes might not be a reliable source of information. This last observation introduces an
unfairness problem: if the recommender system learns to consider the preferences of some
users as more relevant than those of the others, the overall satisfaction of the users belonging
to a group may not be optimal. This unbalance in the negotiation power that the system learns
to assign to different users, with the purpose of obtaining the best possible recommendation
accuracy, may be the result of unfair dynamics, e.g. some users being more aggressive and
some others not feeling confident enough to stand up for themselves. In this work we extend
a state-of-the-art system for context-aware recommendations to ephemeral groups based on
the concept of contextual influence [2, 3] to account also for fairness. Experiments on two
real-world datasets show that our approach outperforms seven other fair group recommender
systems by achieving a consistently better recall while providing similar ethical guarantees.


2. Related Work
Context-aware Recommender Systems
The majority of the existing approaches to Recommender Systems do not take into considera-
tion any contextual information, however, in many applications, the context of use might be
fundamental in guiding the current preference of a user [4]. Recent studies have shown that
Context-Aware Recommender Systems can generate a very high increase in performance [5].

Group Recommender Systems
Group Recommender Systems are systems that produce a common recommendation for a group
of users [6]. Group recommendations works usually address two kinds of groups: persistent
and ephemeral [7]. Persistent groups contain people that have a previous significant history of
activities together, while ephemeral groups are formed by people who happen to be together for
the first time. In the case of persistent groups, classical recommendation techniques can be used,
since the group can be considered as a single user, whereas, in the case of ephemeral groups,
recommendations must be computed on the basis of those known for the members of the group.
A number of different aggregation strategies for the individual preferences have been proposed
over the years [6], however most of these aggregation strategies clearly violate the fairness
principles. For instance, maximum satisfaction, used in [8, 9, 10, 11, 12], chooses the item for
which the individual preference score is the highest, effectively ignoring the satisfaction of most
of the users in the group. Other clear examples of unfair aggregation strategies are works such
as [13, 14, 15], which assign a different power to group members based on their expertise.

Fairness in Recommender Systems
In single-user Recommender Systems, fairness is usually assessed with regard to sensitive
attributes which are generally prone to discrimination (e.g., gender, ethnicity or social belonging)
by verifying the presence of a discriminated class within the user set [16, 17]. When fairness
is evaluated considering Group Recommender Systems, it should be computed within groups.
Since the groups we consider in this work are composed of few users, evaluating fairness in the
way just described is not a suitable solution. Instead of detecting unfairness towards a protected
�group of users, we aim to detect and prevent unfairness towards single users within a group
whose desires are not considered when forming a recommendation for the whole group.

Fairness in Group Recommender Systems
Some aggregation strategies exist that, despite not having been developed to explicitly address
ethical issues, aggregate individual preferences in a way that resembles fairness. Least misery,
used in [7, 8, 18, 19, 9, 20, 10, 11, 12, 21], chooses the items for which the lowest value among
the preferences of the group members is the greatest one. The authors in [22] introduce an
aggregation function which maximizes the satisfaction of group components, while, at the same
time, minimizes the disagreement among them. Average, used in [8, 23, 18, 19, 9, 10, 13, 11, 12, 21],
computes the group preference for an item as the arithmetic mean of the individual scores. Lastly,
some recent works try to explicitly target the aim of producing fair group recommendations. In
[24] the preferences of individual users are combined with a measure of fairness, to guarantee
that all the users are somehow satisfied. In [25, 26] two aggregation strategies are proposed: one
is based on the idea of proportionality, while the other one is based on the idea envy-freeness.
In [27] a greedy algorithm to achieve rank-sensitive balance is presented.


3. The proposed method
In this section we review a previous approach of ours, introduced in [2, 3], CtxInfl. Then, our
contribution to make CtxInfl more fair will be presented. The resulting method is named FARGO.

3.1. CtxInfl
We considere a set of items 𝐼 and a set of users 𝑈 , from which any group 𝐺 ∈ ℘(𝑈 ) can
be extracted. 𝐶 is the set of possible contexts in the given scenario, where a context 𝑐 is the
conjunction of a set of dimension/value pairs: e.g., for the TV dataset, a context might be
𝑐 = ⟨𝑡𝑖𝑚𝑒_𝑠𝑙𝑜𝑡 = 𝑝𝑟𝑖𝑚𝑒𝑡𝑖𝑚𝑒 ∧ 𝑑𝑎𝑦 = 𝑤𝑒𝑒𝑘𝑒𝑛𝑑⟩. We assume the availability of a log ℒ
recording the history of the items previously chosen by groups formed in the past, where each
element of ℒ is a 4-ple (𝑡𝑗 , 𝑐𝑗 , 𝐺𝑗 , 𝑖𝑗 ), 𝑡𝑗 being the time instant in which the item 𝑖𝑗 ∈ 𝐼 has
been chosen by the group 𝐺𝑗 ∈ ℘(𝑈 ) in the context 𝑐𝑗 ∈ 𝐶. A contextual scoring function
𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐), with 𝑢 ∈ 𝑈 , 𝑖 ∈ 𝐼, 𝑐 ∈ 𝐶, assigning to each user the score given to the items in
the various contexts, is computed offline on the basis of the log of the past individual choices
and of the item descriptions in terms of their attributes, using any context-aware recommender
system for single users from the literature. 𝑇 𝑜𝑝𝐾(𝑢, 𝑐, 𝑡) is the function that returns the list
of the 𝐾 items preferred by user 𝑢 in context 𝑐, according to the values of 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐), for
each 𝑖 ∈ 𝐼 available at instant 𝑡. Given a target group 𝐺 ∈ ℘(𝑈 ), a context 𝑐 ∈ 𝐶 and a time
instant 𝑡, the group recommendation is obtained by recommending to the users in 𝐺 a list (i.e.,
an ordered set) of 𝐾 items, considered interesting in context 𝑐, from those items in 𝐼 that are
available at time instant 𝑡, according to the following procedure:
�3.1.1. Influence computation
The group preference for an item is obtained by aggregating the individual preferences of the
group members on the basis of their influence. In each context 𝑐, the influence 𝑖𝑛𝑓 𝑙(𝑢, 𝑐) of a
given user 𝑢 is derived offline by comparing the behavior of 𝑢 when alone (i.e., 𝑢’s individual
preferences) with 𝑢’s behaviors in groups (i.e., the interactions contained in the log ℒ). Basically,
the influence of 𝑢 tells us how many times the groups containing 𝑢 have selected one of 𝑢’s
favorite items. Let 𝑇 𝑜𝑝𝐾(𝑢, 𝑐, 𝑡) be the list of the 𝐾 items preferred by user 𝑢 in context 𝑐,
according to the values of 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐) for each 𝑖 ∈ 𝐼 available at instant 𝑡. The contextual
influence is defined as follows:
                                |𝑙𝑗 ∈ ℒ : 𝑐 = 𝑐𝑗 ∧ 𝑢 ∈ 𝐺𝑗 ∧ 𝑖𝑗 ∈ 𝑇 𝑜𝑝𝐾(𝑢, 𝑐, 𝑡𝑗 )|
                𝑖𝑛𝑓 𝑙(𝑢, 𝑐) =                                                                    (1)
                                            |𝑙𝑗 ∈ ℒ : 𝑐 = 𝑐𝑗 ∧ 𝑢 ∈ 𝐺𝑗 |

The value of 𝑖𝑛𝑓 𝑙(𝑢, 𝑐) quantifies the ability of user 𝑢 to direct the group’s decision towards
𝑢’s own tastes while in context 𝑐.

3.1.2. Top-K Group Recommendation Computation
Top-𝐾 recommendations are computed online, when a group of users requires that the system
suggests some interesting items to be enjoyed together. The system must compute the group
preferences for the items, and then determine the 𝐾 items with the highest scores. Given a
group 𝐺 ∈ ℘(𝑈 ), its preference 𝑠𝑐𝑜𝑟𝑒(𝐺, 𝑖, 𝑐) for 𝑖 ∈ 𝐼 in the context 𝑐 ∈ 𝐶 is computed as
the average of the preferences of its members weighed on the basis of each member’s influence
(Eq. 1) in context 𝑐:
                                        ∑︀
                                              𝑖𝑛𝑓 𝑙(𝑢, 𝑐) · 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐)
                       𝑠𝑐𝑜𝑟𝑒(𝐺, 𝑖, 𝑐) = 𝑢∈𝐺 ∑︀                                             (2)
                                                 𝑢∈𝐺 𝑖𝑛𝑓 𝑙(𝑢, 𝑐)

Then, the top-𝐾 list of items preferred by a certain group 𝐺 in context 𝑐 at time instant 𝑡 is
determined by retrieving the 𝐾 items with the highest scores among those available at time 𝑡.

3.2. FARGO
Being CtxInfl based on the concept of influence, it inevitably privileges the preferences of the
most influential users. As a consequence, the results of the recommendation process are biased
towards the preference of one user or few users of the group who can be considered as the
leaders, or, using a more contemporary word, “influencers". Our aim is to add an element of
fairness to CtxInfl while maintaining its general structure, which already proved to be very
efficient and scalable [3]. Among the various phases of CtxInfl on which we could act (i.e.,
individual preferences computation, influence computation, and Top-K group recommendations
computation), the last is the most suitable one, as it is the only one acting on groups. Following
this intuition, we propose to add a fairness factor to the computation of the score for each item
(Eq. 2), in order to modify the order of the items in the Top-𝐾 list produced in such a way that
items representing unfair recommendations will not appear on top. This is further motivated by
the fact that, when people make decisions in groups, not necessarily they follow the decision
of a leader (as assumed by CtxInfl): in some cases people may take decisions trying to satisfy
�every group member as much as possible. This means that considering only influence may not
be a complete strategy even if we put aside our ethical concerns. In order not to increase the
complexity of the computation of Eq. 2, we build our fairness element using just the individual
contextual scores, which are already used to compute Eq. 2. We call consensus the metric that
quantifies how much the individual preferences of group members agree on the evaluation of
an item. The consensus of a group 𝐺 on an item 𝑖 in a context 𝑐 is therefore defined as:
                                          ∑︀      (︀                                  )︀2
                                            𝑢∈𝐺        𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐) − 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐)
               consensus(𝐺, 𝑖, 𝑐) = 1 −                                                   ,     (3)
                                                                 |𝐺|
where 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐) is the average score for item 𝑖 among 𝐺’s members in context 𝑐. The
consensus for an item for which users gave a similar evaluation will be close to 1, while it will
reach its minimum when very discordant scores are considered. According to the formula of
the maximum variance, 𝑐𝑜𝑛𝑠𝑒𝑛𝑠𝑢𝑠 ∈ [0.75, 1]. After having defined 𝑐𝑜𝑛𝑠𝑒𝑛𝑠𝑢𝑠, we propose
to integrate it in Eq. 2 in the following way:
                                   ∑︀
                                         𝑖𝑛𝑓 𝑙(𝑢, 𝑐) · 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑖, 𝑐)
         𝑓 𝑎𝑖𝑟_𝑠𝑐𝑜𝑟𝑒(𝐺, 𝑖, 𝑐) = 𝑢∈𝐺 ∑︀                                · consensus(𝐺, 𝑖, 𝑐)|𝐺| (4)
                                            𝑢∈𝐺  𝑖𝑛𝑓  𝑙(𝑢, 𝑐)

We exponentiate consensus to the group size (with the effect of further reducing the overall
score) according to the intuition that the magnitude of unfairness in group recommendations
is proportional to the group size. In fact, the bigger the group, the bigger the potential harm
produced by taking into consideration solely the leader/influencer’s will is.


4. Experimental Results
In this section we present the results obtained by applying the proposed approach to two
different real-world datasets. To evaluate the recommendation performance we use recall,
considering for 𝐾 (number of items to be recommended) the values 1, 2 and 3. To evaluate the
ethical properties of our method we used the two metrics proposed in [28] for estimating user
discrimination, called score disparity and recommendation disparity, adapted to our needs. Score
disparity is computed as the Gini coefficient of user satisfaction, i.e., the relative gain achieved
by the user due to the actual recommendation with respect to the optimal recommendation
strategy from the user perspective. Recommendation disparity is computed as the Gini coefficient
of user gains, i.e., how many of the recommended items match the user Top-K items.
   We compare our approach to the following methods: average (AVG) [8, 23, 18, 19, 9, 10, 13,
11, 12, 21], Fair Lin [24], Fair Prop [25, 26], Envy Free [25, 26], minimum disagreement (Dis)
[22], least misery (LM) [7, 8, 18, 19, 9, 20, 10, 11, 12, 21] and GFAR [27].

4.1. TV Dataset
This dataset contains TV viewing information related to 7,921 users and 119 channels, broad-
casted both over the air and by satellite. The dataset is composed of an Electronic Program
Guide (EPG) containing the description of 21,194 distinct programs, and a log containing both
individual and group viewings performed by the users. The log spans from December 2012 to
�                              K=1                             K=2                        K=3
                  Recall       DS       DR         Recall     DS        DR     Recall    DS       DR
   FARGO          37.94%      7.61%   17.85%       54.08%    1.85%    12.69%   64.20%   0.89%   10.08%
    AVG           33.914%     7.07%   18.15%       51.56%    2.93%     8.78%   62.91%   1.36%   7.53%
   Fair Lin        33.22%     8.83%   18.25%       50.80%    3.59%     7.46%   61.21%   1.61%   7.01%
  Fair Prop        32.99%     8.83%   13.45%       50.55%    4.25%     8.90%   62.03%   1.79%   7.70%
  Envy Free        29.33%    10.43%   13.81%       47.37%    4.23%    10.87%   58.67%   1.89%   8.72%
     Dis           33.57%     6.67%   17.45%       51.95%    2.76%     8.97%   63.26%   1.30%   7.61%
     LM            30.35%    5.69%    12.42%       47.10%    2.58%    10.11%   58.27%   1.25%   8.18%
    GFAR           30.47%        -    18.28%       44.48%       -     5.59%    55.19%      -    7.41%

Table 1
Comparison with other fair methods on TV dataset


February 2013 and contains 4,968,231 entries, among which we retained just the syntonizations
longer than three minutes. 3,519,167 viewings were performed by individual users, and are
used to compute the individual preferences of the group members. The remaining 1,449,064
viewings have been done by more than one person. The two context dimensions considered are
day of the week (weekday vs. weekend) and the time slot. The available values for the time slot
are: graveyard slot, early morning, morning, daytime, early fringe, prime access, primetime,
and late fringe. Group viewings are split into a training set (1,210,316 entries), and a test set
(238,748 entries) with a 80%-20% ratio. Results are reported in Table 1. Note that the superiority
of our method, recall-wise, is very pronounced. For what regards the ethical guarantees, FARGO,
delivers a very good score disparity, while, for what regards the recommendation disparity, it
seems to perform generally worse than the other methods (except for 𝑘 = 1, for which its
performance is on par with the other methods). Note that for GFAR it is not possible to compute
the score disparity as it does not involve the computation of group scores for the items.

4.2. Music Dataset
This dataset1 has been created within the scope of a user study by asking participants to fill in
two different forms: an individual form collecting demographic data (i.e., age and gender) and
contextual individual preferences about music artists, and a group form to be filled in groups
asking for a collective choice of a music artist that was available at the time of the choice in a
particular context. The following two listening contexts have been selected, considering that
both are common situations users can relate to both when alone and when with other people,
and that users’ preferences would likely be different in each of them: during a car trip and at
dinner as background music. The dataset obtained contains data gathered from 280 users. For
each user, preferences regarding both the car trip and dinner contexts are gathered. From the
group forms, 498 context-aware collective preferences have been gathered. Of this, 272 groups
were composed of 2 users, 158 of 3 users, 32 of 4 users and 36 of 5 users. As for the previous
dataset, we used a 80%-20% split for training and test sets. Results are reported in Table 2. Also
in this case FARGO delivers the best recall. Contrarily to the previous dataset, in this case our
method achieves a very good recommendation disparity. For what regards the score disparity, all
methods provide very low (i.e., good) values.
   1
       The dataset can be downloaded at https://github.com/azzada/FARGO.
�                         K=1                          K=2                          K=3
                Recall    DS      DR         Recall    DS      DR        Recall     DS      DR
    FARGO       25.00%   2.19%   1.62%       40.28%   0.87%   2.03%      49.31%    0.53%   2.40%
     AVG        12.50%   0.81%   3.24%       25.00%   0.39%   2.91%      34.72%    0.25%   2.71%
    Fair Lin    11.11%   2.19%   4.17%       23.61%   0.81%   2.14%      31.94%    0.48%   1.81%
   Fair Prop    13.19%   0.66%   2.55%       20.83%   0.38%   3.24%      29.86%    0.37%   3.00%
   Envy Free    12.50%   0.81%   3.24%       25.00%   0.39%   2.95%      34.72%    0.25%   2.71%
      Dis       22.92%   0.74%   3.41%       34.72%   0.43%   2.83%      41.67%    0.32%   2.49%
      LM        13.89%   1.14%   3.76%       25.00%   0.35%   3.13%      34.72%    0.28%   1.99%
     GFAR        6.06%      -    8.73%       24.24%      -    1.88%      33.33%       -    6.24%
Table 2
Comparison with other fair methods on Music dataset


5. Conclusions
In this paper we have introduced FARGO, a new method for providing fair, context-aware
recommendations to ephemeral groups able also to recommend items new in the system.
Considering both recall and fairness, it is not possible to identify a best overall method across all
datasets and values of 𝐾. Even if we ignored recall, a clear winner fairness-wise is not evident
(all methods tested, except for Dis, perform best fairness-wise for at least a value of 𝐾 in at least
one dataset). We argue that the relationship between fairness and recommendation accuracy
should be seen as a tradeoff. On both datasets of our experiments, FARGO provides the best
solution to such tradeoff by achieving the best recall across all values of 𝐾 while delivering
similar ethical guarantees to the other fair methods tested. Contrarily to what one might think,
LM is not the best method fairness-wise, and this implies that the problem of maximizing
both recall and fairness is not a simple one. This is a complex problem that deserves further
investigations, as recall and fairness seem not to be inversely correlated in a trivial manner.


References
 [1] F. Ricci, L. Rolach, B. Shapira, P. B. Kantor, Recommender Systems Handbook, Springer,
     2011.
 [2] E. Quintarelli, E. Rabosio, L. Tanca, Recommending new items to ephemeral groups using
     contextual user influence, in: Proc. RecSys, 2016, pp. 285–292.
 [3] E. Quintarelli, E. Rabosio, L. Tanca, Efficiently using contextual influence to recommend
     new items to ephemeral groups, Inf. Syst. 84 (2019) 197–213.
 [4] G. Adomavicius, A. Tuzhilin, Context-Aware Recommender Systems, Springer, 2011, pp.
     217–253.
 [5] K. Verbert, N. Manouselis, X. Ochoa, M. Wolpers, H. Drachsler, I. Bosnic, E. Duval, Context-
     aware recommender systems for learning: A survey and future challenges, IEEE Transac-
     tions on Learning Technologies 5 (2012) 318–335.
 [6] J. Masthoff, Group Recommender Systems: Combining Individual Models, Springer, 2011,
     pp. 677–702.
 [7] M. O’Connor, D. Cosley, J. A. Konstan, J. Riedl, Polylens: A recommender system for
     groups of users, in: Proc. ECSCW, 2001, pp. 199–218.
� [8] J. Masthoff, Group modeling: Selecting a sequence of television items to suit a group of
     viewers, in: Personalized Digital Television, Springer, 2004, pp. 93–141.
 [9] E. Ntoutsi, K. Stefanidis, K. Nørvåg, H.-P. Kriegel, Fast group recommendations by applying
     user clustering, in: Proc. ER, 2012, pp. 126–140.
[10] A. J. Chaney, M. Gartrell, J. M. Hofman, J. Guiver, N. Koenigstein, P. Kohli, U. Paquet, A
     large-scale exploration of group viewing patterns, in: Proc. TVX, 2014, pp. 31–38.
[11] T. De Pessemier, S. Dooms, L. Martens, Comparison of group recommendation algorithms,
     Multimedia Tools Appl. 72 (2014) 2497–2541.
[12] N.-r. Kim, J.-H. Lee, Group recommendation system: Focusing on home group user in tv
     domain, in: Proc. SCIS, 2014, pp. 985–988.
[13] I. Ali, S.-W. Kim, Group recommendations: approaches and evaluation, in: Proc. IMCOM,
     2015, pp. 1–6.
[14] M. Gartrell, X. Xing, Q. Lv, A. Beach, R. Han, S. Mishra, K. Seada, Enhancing group
     recommendation by incorporating social relationship interactions, in: Proc. GROUP, 2010,
     pp. 97–106.
[15] S. Berkovsky, J. Freyne, Group-based recipe recommendations: analysis of data aggregation
     strategies, in: Proc. RecSys, 2010, pp. 111–118.
[16] S. Yao, B. Huang, New fairness metrics for recommendation that embrace differences,
     CoRR abs/1706.09838 (2017).
[17] Y. Li, Y. Ge, Y. Zhang, Tutorial on fairness of machine learning in recommender systems,
     in: Proc. SIGIR, 2021, pp. 2654–2657.
[18] L. Baltrunas, T. Makcinskas, F. Ricci, Group recommendations with rank aggregation and
     collaborative filtering, in: Proc. RecSys, 2010, pp. 119–126.
[19] C. Senot, D. Kostadinov, M. Bouzid, J. Picault, A. Aghasaryan, C. Bernier, Analysis of
     strategies for building group profiles, in: Proc. UMAP, 2010, pp. 40–51.
[20] J. Gorla, N. Lathia, S. Robertson, J. Wang, Probabilistic group recommendation via infor-
     mation matching, in: Proc. WWW, 2013, pp. 495–504.
[21] S. Ghazarian, M. A. Nematbakhsh, Enhancing memory-based collaborative filtering for
     group recommender systems, Expert Syst. Appl. 42 (2015) 3801–3812.
[22] S. Amer-Yahia, S. B. Roy, A. Chawlat, G. Das, C. Yu, Group recommendation: Semantics
     and efficiency, in: Proc. VLDB, 2009, pp. 754–765.
[23] Z. Yu, X. Zhou, Y. Hao, J. Gu, Tv program recommendation for multiple viewers based on
     user profile merging, User Model. User-Adapt. Int. 16 (2006) 63–82.
[24] L. Xiao, Z. Min, Z. Yongfeng, G. Zhaoquan, L. Yiqun, M. Shaoping, Fairness-aware group
     recommendation with pareto-efficiency, in: Proc. RecSys, 2017, pp. 107–115.
[25] S. Qi, N. Mamoulis, E. Pitoura, P. Tsaparas, Recommending packages to groups, in: Proc.
     ICDM, 2016, pp. 449–458.
[26] D. Serbos, S. Qi, N. Mamoulis, E. Pitoura, P. Tsaparas, Fairness in package-to-group
     recommendations, in: Proc. WWW, 2017, pp. 371–379.
[27] M. Kaya, D. Bridge, N. Tintarev, Ensuring fairness in group recommendations by rank-
     sensitive balancing of relevance, in: Proc. RecSys, 2020, pp. 101–110.
[28] J. Leonhardt, A. Anand, M. Khosla, User fairness in recommender systems, in: Proc.
     WWW, 2018, pp. 101–102.
�