Vol-3194/paper22

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search

Paper

Paper
edit
description  
id  Vol-3194/paper22
wikidataid  Q117344911→Q117344911
title  A Network-based Model and a Related Approach to Represent and Handle the Semantics of Comments in a Social Network
pdfUrl  https://ceur-ws.org/Vol-3194/paper22.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/BonifaziCCMTUV22
volume  Vol-3194→Vol-3194
session  →

A Network-based Model and a Related Approach to Represent and Handle the Semantics of Comments in a Social Network

load PDF

A Network-based Model and a Related Approach to
Represent and Handle the Semantics of Comments in
a Social Network
(Discussion Paper)

Gianluca Bonifazia , Francesco Cauteruccioa , Enrico Corradinia , Michele Marchettia ,
Giorgio Terracinab , Domenico Ursinoa and Luca Virgilia
a
    DII, Polytechnic University of Marche
b
    DEMACS, University of Calabria


                                         Abstract
                                         In this paper, we propose a network-based model and a related approach to represent and handle the
                                         semantics of a set of comments expressed by users of a social network. Our model and approach are multi-
                                         dimensional and holistic because they manage the semantics of comments from multiple perspectives.
                                         Our approach first selects the text patterns that best characterize the involved comments. Then, it uses
                                         these patterns and the proposed model to represent each set of comments by means of a suitable network.
                                         Finally, it adopts a suitable technique to measure the semantic similarity of each pair of comment sets.

                                         Keywords
                                         Comment analysis, Social Network Analysis, Text Pattern Mining, Semantic Similarity, Utility Functions




1. Introduction
In the last few years, the investigation of the content of comments expressed by people in
social media has increased enormously [1]. In fact, social media comments are one of the places
where people tend to express their ideas most spontaneously [2]. Consequently, they play a
privileged role in allowing the reconstruction of the real feelings and thoughts of a person, as
well as in building a more faithful profile of her [3, 4, 5]1 . Spontaneity is both the main strength
and one of the main weaknesses of comments. In fact, they are often written on the spur of
the moment, with a language style that is not very structured, apparently confused and in
some cases contradictory. In spite of these flaws, the set of comments written by a certain user

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
$ g.bonifazi@univpm.it (G. Bonifazi); f.cauteruccio@univpm.it (F. Cauteruccio); e.corradini@pm.univpm.it
(E. Corradini); m.marchetti@pm.univpm.it (M. Marchetti); terracina@mat.unical.it (G. Terracina);
d.ursino@univpm.it (D. Ursino); luca.virgili@univpm.it (L. Virgili)
� 0000-0002-1947-8667 (G. Bonifazi); 0000-0001-8400-1083 (F. Cauteruccio); 0000-0002-1140-4209 (E. Corradini);
0000-0003-3692-3600 (M. Marchetti); 0000-0002-3090-7223 (G. Terracina); 0000-0003-1360-8499 (D. Ursino);
0000-0003-1509-783X (L. Virgili)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR

          CEUR Workshop Proceedings (CEUR-WS.org)
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073




                  1
     In this paper, we focus on comments expressed by people through their well-defined accounts. We do not
consider anonymous comments because they are less reliable and, in any case, not useful for the objectives of our
research.
�provides an overview of her thoughts and profile. Reconstructing the latter from the apparent
“chaos” inherent in the comments is a challenging issue for researchers working in the context
of the extraction of content semantics.
   In this paper, we want to make a contribution in this context by proposing a model, and a
related approach, to detect and handle the content semantics from a set of comments posted on
a social network. We argue that our model and its related approach are able to extract from the
apparent “chaos” of comments the thoughts of their publisher and, eventually, to reconstruct
the corresponding profile. However, the latter is only one of the possible uses of our model
and approach. In fact, if we widen our gaze to the comments written by all the users of a
certain community, we are able to understand the dominant thoughts in it. If we consider all
the comments on a certain topic (e.g., COVID-19), we can reconstruct the various viewpoints
on such a topic. Again, if we consider all comments in a certain time period (e.g., the first three
months of the year 2022) we can determine what are the dominants thoughts in that period.
Furthermore, the reconstruction of thoughts is only one of the possible applications of our
model and approach. Other ones may include, for example, constructing recommender systems,
building new user communities, identifying outliers or constructing new topic forums. Some of
the most interesting applications are described in [6].
   This paper is organized as follows: In Section 2, we present an overview of our proposal. In
Section 3, we illustrate our model. In Section 4, we describe our approach. Finally, in Section 5,
we draw our conclusions and have a look at some possible future developments. Due to space
limitations, we cannot describe here the experiments carried out to test our model and approach.
However, the interested reader can find them in [6].


2. An overview of our proposal
Our approach consists of two phases, namely pre-processing and knowledge extraction.
   The pre-processing phase is aimed at cleaning and annotating the available comments and,
then, selecting the most meaningful ones. During the cleaning activity, bot-generated content,
errors, inconsistencies, etc., are removed, and comment tokenization and lemmatization tasks
are performed. The annotation activity aims to automatically enrich each lemmatized comment
with some important information, such as the value of the associated sentiment, the post to
which it refers, the author who wrote it, etc.
   The selection of the most significant comments is based on a text pattern mining technique.
While most of the approaches to perform such a task proposed in the past consider only the
frequency of patterns [7], our technique considers also, and primarily, its utility [8, 9], measured
with the support of a utility function. Regarding this function, we point out that our technique is
orthogonal to the utility function used. As a consequence, it is possible to choose different utility
functions to prioritize certain comment properties over others. A first utility function could
be the sentiment of comments; it could allow, for instance, the identification of only positive
comments or only negative ones. A second utility function might be the rate of comments; it
might allow, for instance, the selection of patterns involving only high rate comments or only
low rate ones. A third utility function could be the Pearson’s correlation [10] between sentiment
and rate; it could allow, for instance, the selection of patterns involving only comments with
�discordant (resp., concordant) sentiment and rate. More details on utility functions can be found
in Section 3.
   Once the comments and patterns of interest have been selected, it is necessary to have a
model for their representation and management. As mentioned in the Introduction, in this
paper we propose a new network-based model called CS-Net (Content Semantic Network).
The nodes of a CS-Net represent the comment lemmas. Its arcs can be of two types, which
reflect two different perspectives on the investigation of comment semantics. The first one,
based on the concept of co-occurrence, reflects the past results obtained by many Information
Retrieval researchers [11]. It assumes that two semantic related lemmas tend to appear together
very often in sentences. The second one, based on the concepts of semantic relationships and
semantically related terms, reflects the past results obtained by many researchers working in
Natural Language Processing [12]. Actually, the CS-Net model is extensible so that, if in the
future we wanted to add additional perspectives for investigating comment content, it would
be sufficient to add a new type of arc for each new perspective. The CS-Net model is described
in detail in Section 3.
   After selecting the comments and patterns of interest, and after representing them by means
of CS-Nets, a technique to evaluate the semantic similarity of two CS-Nets is necessary. This
technique operates by separately evaluating, and then appropriately combining, the semantic
similarity of each pair of subnets obtained by projecting the original CS-Nets in such a way
as to consider only one type of arcs at a time. The combination of the single components is
done by weighting them differently, based on the extension of the CS-Net projections from
which they are derived. This extension is determined by the number of the corresponding
arcs. In particular, our technique favors the most extensive component, because it represents a
larger portion of the content semantics than the other. Analogously to the CS-Net model, our
technique for computing the similarity of two CS-Nets is extensible if one wants to add new
perspectives of semantic similarity evaluation. In fact, to obtain an overall semantic similarity
value, it is sufficient to compute the components related to each perspective separately, and
then combine them according to the procedure mentioned above. In evaluating the semantics
of two homogeneous subnets (i.e., subnets of only co-occurrences or subnets of only semantic
relationships), our technique considers two further aspects, namely the topological similarity
of the subnets and the similarity of the concepts expressed by the corresponding nodes. To
compute the former, we adopt an approach already proposed in the literature, i.e., NetSimile
[13]. To compute the latter, we use an enhanced version of the Jaccard coefficient, capable of
considering synonymies and homonymies as well. Adding these two further contributions to
co-occurrences and semantic relationships makes our approach even more holistic. A detailed
description of our technique for evaluating the semantic similarity of two CS-Nets can be found
in Section 4.


3. Proposed model
Let 𝒞 = {𝑐1 , 𝑐2 , · · · 𝑐𝑛 } be a set of lemmatized comments and let ℒ = {𝑙1 , 𝑙2 , · · · , 𝑙𝑞 } be the set
of all lemmas that can be found in a comment of 𝒞. Each comment 𝑐𝑘 ∈ 𝒞 can be represented
as a set of lemmas 𝑐𝑘 = {𝑙1 , 𝑙2 , . . . , 𝑙𝑚 }; therefore, 𝑐𝑘 ⊆ ℒ. A text pattern 𝑝ℎ is a set of lemmas;
�therefore, 𝑝ℎ ⊆ ℒ.
  We are interested in patterns with frequency values and utility functions belonging to
appropriate intervals. In particular, as far as frequency is concerned, we are interested in
patterns whose frequency value is greater than a certain threshold. Instead, for what concerns
the utility function, the scenario is more complex, because it depends on the utility function
adopted and the context in which our model is used. For example:

    • We could employ as utility function the average sentiment value of the comments to
      which the pattern of interest refers. We call 𝑓𝑠 (·) this utility function. It allows us to
      select patterns characterized by a compound score (and, therefore, a sentiment value)
      very high (e.g., positive patterns), very low (e.g., negative patterns) or belonging to a
      given range (e.g., neutral patterns).
    • We could adopt as utility function the Pearson’s correlation [10] between the sentiment
      and the score of the comments in which the pattern of interest is present. We call 𝑓𝑝 (·)
      this utility function. It allows us to select: (i) patterns having a high sentiment value and
      stimulating positive comments; (ii) patterns having a low sentiment value and stimulating
      negative comments; (iii) patterns having a high sentiment value and stimulating negative
      comments; (iv) patterns having a low sentiment value and stimulating positive comments.
      Clearly, in the vast majority of investigations, the patterns of interest are those related
      to cases (i) and (ii). However, there may be rare cases where the patterns of interest are
      those related to cases (iii) and (iv).

   In the following, we denote by 𝒫 the set of the patterns of interest, whose values of frequency
and utility function belong to the intervals of interest for the application that is being considered.
   We are now able to formalize our model. In particular, a Content Semantics Network (hereafter,
CS-Net) 𝒩 is defined as 𝒩 = ⟨𝑁, 𝐴𝑐 ∪ 𝐴𝑟 ⟩.
   𝑁 is the set of nodes of 𝒩 . There is a node 𝑛𝑖 ∈ 𝑁 for each lemma 𝑙𝑖 ∈ ℒ. Since there exists
a biunivocal correspondence between 𝑛𝑖 and 𝑙𝑖 , in the following we will use these two symbols
interchangeably.
   𝐴𝑐 is the set of co-occurrence arcs. There is an arc (𝑛𝑖 , 𝑛𝑗 , 𝑤𝑖𝑗 ) ∈ 𝐴𝑐 if the lemmas 𝑙𝑖 and 𝑙𝑗
appear at least once together in a pattern of 𝒫. 𝑤𝑖𝑗 is a real number belonging to the interval
[0, 1] and denoting the strength of the co-occurrence. The higher 𝑤𝑖𝑗 , the higher this strength.
For example, 𝑤𝑖𝑗 could be obtained as a function of the number of patterns in which 𝑙𝑖 and 𝑙𝑗
co-occur.
   𝐴𝑟 is the set of semantic relationship arcs. There is an arc (𝑛𝑖 , 𝑛𝑗 , 𝑤𝑖𝑗 ) ∈ 𝐴𝑟 if there is a
semantic relationship between 𝑙𝑖 and 𝑙𝑗 . 𝑤𝑖𝑗 is a real number in the interval [0, 1] denoting the
strength of the relationship. The higher 𝑤𝑖𝑗 , the higher this strength. 𝑤𝑖𝑗 could be computed
using ConceptNet [14] and considering both the number of times 𝑙𝑗 is present in the set of
“related terms” of 𝑙𝑖 and the values of the corresponding weights.
   An observation on the structure of the CS-Net model is necessary. As specified above, our
goal is to model and manage the semantics of the content of a set of comments. CS-Net is
a model tailored exactly to that goal. For this reason, it considers two perspectives derived
from the past literature. The former is related to the concept of co-occurrence. It indicates
that two semantically related lemmas tend to appear very often together in sentences. This
�perspective is probably the most immediate in the context of text mining. In fact, here, it is
well known that the frequency with which two or more lemmas appear together in a text
represents an index of their correlation. The potential weakness of this perspective lies in
the need to compute the frequency of each pair of lemmas. Moreover, this computation must
be continually updated whenever a new comment is taken into consideration. The latter is
related to the concepts of semantic relationships and semantically related terms. These refer
to several researches conducted in the past in the contexts of Information Retrieval [11] and
Natural Language Processing [12]. In this perspective, the meanings of the terms, and thus
their semantics, are taken into consideration. Indeed, semantic relationships between terms
(e.g., synonymies and homonymies) are a very common feature in natural languages. The main
weakness of this perspective lies in the need to have the availability of a thesaurus, which
stores the semantic relationships between terms. If such a tool exists, the computation of the
strength of the semantic relationship is straightforward. Clearly, additional perspectives could
be considered in the future. This is facilitated by the extensibility of our model. Indeed, if one
wanted to consider a new perspective, it would be sufficient to add to 𝐴𝑐 and 𝐴𝑟 a third set of
arcs representing the new perspective.


4. Evaluation of the semantic similarity of two CS-Nets
In this section, we illustrate our approach for computing the semantic similarity of content
related to two sets of comments represented by means of two CS-Nets 𝒩1 and 𝒩2 . It receives
two CS-Nets 𝒩1 and 𝒩2 and returns a coefficient 𝜎12 , whose value belongs to the real interval
[0, 1]. It measures the strength of the semantic similarity of the content represented by 𝒩1 and
𝒩2 ; the higher its value, the higher the semantic similarity. Our technique behaves as follows:

    • It constructs two pairs of subnets (𝒩1𝑐 , 𝒩2𝑐 ) and (𝒩1𝑟 , 𝒩2𝑟 ). The former (resp., latter) is
      obtained by selecting only the co-occurrence (resp., semantic relationship) arcs from the
      networks 𝒩1 and 𝒩2 . Specifically: 𝒩1𝑐 = ⟨𝒩1 , 𝐴𝑐1 ⟩, 𝒩2𝑐 = ⟨𝒩2 , 𝐴𝑐2 ⟩, 𝒩1𝑟 = ⟨𝒩1 , 𝐴𝑟1 ⟩,
      and 𝒩2𝑟 = ⟨𝒩2 , 𝐴𝑟2 ⟩. If, in the future, we want to add a new perspective, and therefore
      a new set of arcs beside 𝐴𝑐 and 𝐴𝑟 , it will be sufficient to build another pair of subnets
      corresponding to the new perspective.
    • It computes the semantic similarity degree 𝜎12  𝑐 and 𝜎 𝑟 for the pairs of networks (𝒩 𝑐 , 𝒩 𝑐 )
                                                              12                             1    2
      and (𝒩1𝑟 , 𝒩2𝑟 ), respectively. The approach for computing 𝜎12    𝑥 , 𝑥 ∈ {𝑐, 𝑟} should be as

      holistic as possible. To this end, it is necessary to define a formula capable of considering
      as many factors as possible, among those that are believed to influence the semantic
      similarity degree of two networks 𝒩1𝑥 and 𝒩2𝑥 , 𝑥 ∈ {𝑐, 𝑟}. In particular, it is possible to
      consider at least two factors with these characteristics.
      The first factor concerns the topological similarity of the networks, i.e., the similarity of
      their structural characteristics. The structure of a network is ultimately determined by its
      nodes and arcs. In our networks, nodes are associated with lemmas, while arcs represent
      features (e.g., co-occurrences or semantic relationships) contributing significantly to
      define the semantics of the lemmas they connect. This reasoning is further reinforced
      by the fact that the semantics of a lemma can be contributed by the lemmas to which
      it is related in the network (in this observation, the application to the CS-Net of the
�principle of homophily, which characterizes social networks, takes place). The second
factor is much more immediate. In fact, it concerns the semantic meaning of the concepts
expressed by the nodes of the CS-Net, each representing a lemma of the set of comments
associated with it.
Regarding the first factor, many approaches for computing the similarity degree of the
structures of two networks have been proposed in the past literature. We decided to
adopt one of these approaches, i.e., NetSimile [13]. This choice is motivated by the fact
that the latter has a much shorter computation time than the other related approaches.
At the same time, it guarantees an accuracy level adequate for our reference context.
NetSimile extracts and evaluates the structural characteristics of each node by analyzing
the structural characteristics of its ego network. Therefore, in order to return the similarity
score of two networks, it computes the similarity degree of the corresponding vectors of
features.
Regarding the second factor, we decided to consider the portion of nodes with the same or
similar meaning present in the two subnets of the pair. A simple, but very effective, way to
do this is the computation of the Jaccard coefficient between the sets of lemmas associated
with the nodes of the two CS-Nets. Actually, the Jaccard coefficient only considers
equality between two lemmas, while we can also have lexicographic relationships (e.g.,
synonymies and homonymies) between them [15]. These can modify the semantic
relationships between two lemmas and, therefore, must be taken into consideration. To
do so, our technique uses an advanced thesaurus, i.e., ConceptNet [14], which includes
WordNet within it. Based on this thesaurus, we redefine the Jaccard coefficient and
introduce an enhanced version of it, which we call 𝐽 * . It behaves as the classic Jaccard
coefficient but takes lexicographic relationships into account.
Given these premises, we can define the formula for the computation of 𝜎12      𝑥 :


                     𝑥 = 𝛽 𝑥 · 𝜈(𝒩 𝑥 , 𝒩 𝑥 ) + (1 − 𝛽 𝑥 ) · 𝐽 * (𝑁 𝑥 , 𝑁 𝑥 ).
                    𝜎12           1     2                         1     2

Here:
   – 𝜈(𝒩1𝑥 , 𝒩2𝑥 ) is a function that applies NetSimile for computing the topological simi-
     larity of 𝒩1𝑥 and 𝒩2𝑥 .
   – 𝐽 * (𝑁1𝑥 , 𝑁2𝑥 ) is the enhanced Jaccard coefficient between 𝒩1𝑥 and 𝒩2𝑥 .
   – 𝛽 𝑥 represents the weight given to the topological similarity of CS-Nets with respect
     to the lexical similarity of the lemmas associated with their nodes. A discussion
     on the possible formulas for 𝛽 𝑥 based on the objectives one wants to pursue in a
     specific application can be found in [6].
Note that our approach for computing 𝜎12𝑥 can operate on any projection 𝒩 𝑥 and 𝒩 𝑥 of
                                                                              1        2
the networks 𝒩1 and 𝒩2 . In fact, the only constraint related to it is that the arcs of the
two networks involved are of the same type 𝑥. This allows it to be extensible. Indeed,
if we wish to add a new perspective on modeling content semantics in the future, the
similarity degree of the corresponding projections of 𝒩1 and 𝒩2 can be computed using
the same formula of 𝜎12𝑥 described above.
�    • It computes the overall semantic similarity degree 𝜎12 of 𝒩1 and 𝒩2 as a weighted mean
      of the two semantic similarity degrees 𝜎12
                                               𝑐 and 𝜎 𝑟 :
                                                       12
                                      𝑐 ·𝜎 𝑐 +𝜔 𝑟 ·𝜎 𝑟
                                     𝜔12
                             𝜎12 =        12
                                         𝜔12
                                               12 12
                                           𝑐 +𝜔 𝑟      = 𝛼 · 𝜎12
                                                              𝑐 + (1 − 𝛼) · 𝜎 𝑟
                                                                             12
                                               12

                                   𝜔𝑐
      In this formula, 𝛼 = 𝜔𝑐 +𝜔     12
                                        𝑟  weights the semantic similarity obtained through the
                                  12    12
      analysis of co-occurrences against the one derived from the analysis of the semantic
      relationships between lemmas. The rationale behind it is that the greater the amount of
      information carried out by one perspective, relative to the other, the greater its weight
      in defining the overall semantics. Now, since |𝑁1𝑐 | = |𝑁1𝑟 | and |𝑁2𝑐 | = |𝑁2𝑟 |, the amount
      of information carried out by the two perspectives can be measured by considering the
      cardinality of the corresponding sets of arcs. On the basis of this reasoning, we have
                       𝑐   𝑐                𝑟   𝑟            |𝐴𝑐1 |               |𝐴𝑐2 |
              𝑐 = 𝜔1 +𝜔2 , and 𝜔 𝑟 = 𝜔1 +𝜔2 , 𝜔 𝑐 =
      that: 𝜔12                                          |𝐴𝑐1 |+|𝐴𝑟1 | , 𝜔2 = |𝐴𝑐2 |+|𝐴𝑟2 | , 𝜔1 = 1 − 𝜔1 ,
                                                                          𝑐                    𝑟        𝑐
                         2           12       2     1
      𝜔2𝑟 = 1 − 𝜔2𝑐 . These formulas essentially tell us that the importance of a perspective in
      determining the overall content semantics is directly proportional to the number of pairs
      of lemmas it can involve.
      Finally, note that 𝜎12 ranges in the real interval [0, 1]. The higher 𝜎12 , the greater the
      similarity of 𝒩1 and 𝒩2 .
      Like the other components of our approach, the one for computing 𝜎12 is extensible.
      In fact, in the future, if we wanted to add additional perspectives for modeling content
      semantics, we would simply add to 𝜎12     𝑐 and 𝜎 𝑟 an additional similarity coefficient for
                                                        12
      each added perspective and modify the weights in the formula of 𝜎12 accordingly.


5. Conclusion
In this paper, we have proposed a model and a related approach to represent and handle content
semantics in a social platform. Our model is network-based and is capable of representing
content semantics from different perspectives. It is also extensible in that new perspectives can
be easily added when desired. It first performs the detection of the text patterns of interest,
based not only on their frequency but also on their utility. Then, it uses these patterns and the
proposed model to represent each set of comments by means of a CS-Net. Finally, it adopts
a suitable technique to measure the semantic similarity of each pair of comment sets. The
latter information can be useful in a variety of applications, ranging from the construction of
recommender systems to the building of new topic forums [6].
   In the future, we plan to extend this research in various directions. First, we could use our
approach as the core of a system for the automatic identification of offensive content of a certain
type (cyberbullism, racism, etc.) in a set of comments. In addition, we could study the evolution
of CS-Nets over time. This could allow us to identify new trends and topics that characterize a
social platform. Finally, we plan to use our approach in a sentiment analysis context. Indeed, in
the past literature, there are several studies on how people with anxiety and/or psychological
disorders write their comments on social media. We could contribute to this research effort
by considering sets of comments written by users with these characteristics, constructing the
corresponding CS-Nets and analyzing them in detail. We could also compare these CS-Nets
with “template CS-Nets”, typical of a certain emotional state, to support classification activities.
�References
 [1] X. Chen, Y. Yuan, M. Orgun, Using Bayesian networks with hidden variables for identifying
     trustworthy users in social networks, Journal of Information Science 46 (2020) 600–615.
     SAGE Publications Sage UK: London, England.
 [2] P. Boczkowski, M. Matassi, E. Mitchelstein, How young users deal with multiple platforms:
     The role of meaning-making in social media repertoires, Journal of Computer-Mediated
     Communication 23 (2018) 245–259. Oxford University Press.
 [3] F. Cauteruccio, E. Corradini, G. Terracina, D. Ursino, L. Virgili, Investigating Reddit to
     detect subreddit and author stereotypes and to evaluate author assortativity, Journal
     of Information Science (2021). doi:https://doi.org/10.1177/01655515211047428,
     sAGE.
 [4] B. Abu-Salih, P. Wongthongtham, K. Chan, K. Yan, D. Zhu, CredSaT: Credibility ranking
     of users in big social data incorporating semantic analysis and temporal factor, Journal of
     Information Science 45 (2019) 259–280. SAGE Publications Sage UK: London, England.
 [5] S. Ahmadian, M. Afsharchi, M. Meghdadi, An effective social recommendation method
     based on user reputation model and rating profile enhancement, Journal of Information
     Science 45 (2019) 607–642. SAGE Publications Sage UK: London, England.
 [6] G. Bonifazi, F. Cauteruccio, E. Corradini, M. Marchetti, G. Terracina, D. Ursino, L. Virgili,
     Representation, detection and usage of the content semantics of comments in a social
     platform, Journal of Information Science (Forthcoming). SAGE.
 [7] P. Fournier-Viger, J. C.-W. Lin, R. U. Kiran, Y. Koh, R. Thomas, A survey of sequential
     pattern mining, Data Science and Pattern Recognition 1 (2017) 54–77.
 [8] P. Fournier-Viger, J. Lin, B. Vo, T. Chi, J. Zhang, H. Le, A survey of itemset mining,
     WIREs Data Mining and Knowledge Discovery 7 (2017) e1207. doi:https://doi.org/
     10.1002/widm.1207, Wiley.
 [9] L. Gadár, J. Abonyi, Frequent pattern mining in multidimensional organizational networks,
     Scientific Reports 9 (2019) 1–12. Nature Publishing Group.
[10] K. Pearson, Note on Regression and Inheritance in the Case of Two Parents, Proceedings
     of the Royal Society of London 58 (1895) 240–242. The Royal Society.
[11] Y. Djenouri, A. Belhadi, P. Fournier-Viger, J. Lin, Fast and effective cluster-based infor-
     mation retrieval using frequent closed itemsets, Information Sciences 453 (2018) 154–167.
     Elsevier.
[12] Z. Bouraoui, J. Camacho-Collados, S. Schockaert, Inducing relational knowledge from
     BERT, in: Proc. of the International Conference on Artificial Intelligence (AAAI 2020),
     volume 34(05), New York, NY, USA, 2020, pp. 7456–7463. Association for the Advancement
     of Artificial Intelligence.
[13] M. Berlingerio, D. Koutra, T. Eliassi-Rad, C. Faloutsos, Netsimile: A scalable approach to
     size-independent network similarity, arXiv preprint arXiv:1209.2684 (2012).
[14] H. Liu, P. Singh, ConceptNet — a practical commonsense reasoning tool-kit, BT technology
     journal 22 (2004) 211–226. Springer.
[15] P. De Meo, G. Quattrone, G. Terracina, D. Ursino, Integration of XML Schemas at various
     “severity” levels, Information Systems 31(6) (2006) 397–434.
�