Vol-3194/paper21

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search

Paper

Paper
edit
description  
id  Vol-3194/paper21
wikidataid  Q117345056→Q117345056
title  MAT-Builder: a System to Build Semantically Enriched Trajectories
pdfUrl  https://ceur-ws.org/Vol-3194/paper21.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/PuglieseLRP22
volume  Vol-3194→Vol-3194
session  →

MAT-Builder: a System to Build Semantically Enriched Trajectories

load PDF

MAT-Builder: a System to Build Semantically
Enriched Trajectories
Chiara Pugliese1,2 , Francesco Lettich1 , Chiara Renso1 and Fabio Pinelli3
1
  ISTI-CNR, Pisa, Italy
2
  Department of Computer Science, University of Pisa
3
  IMT School for Advanced Studies, Lucca, Italy


                                         Abstract
                                         The notion of multiple aspect trajectory (MAT) has been recently introduced in the literature to represent
                                         movement data that is heavily semantically enriched with dimensions (aspects) representing various
                                         types of semantic information (e.g., stops, moves, weather, traffic, events, and points of interest). Aspects
                                         may be large in number, heterogeneous, or structurally complex. Although there is a growing volume of
                                         literature addressing the modelling and analysis of multiple aspect trajectories, the community suffers
                                         from a general lack of publicly available datasets. This is due to privacy concerns that make it difficult to
                                         publish such type of data, and to the lack of tools that are capable of linking raw spatio-temporal data to
                                         different types of semantic contextual data. In this work we aim to address this last issue by presenting
                                         MAT-Builder, a system that not only supports users during the whole semantic enrichment process, but
                                         also allows the use of a variety of external data sources. Furthermore, MAT-Builder has been designed
                                         with modularity and extensibility in mind, thus enabling practitioners to easily add new functionalities.
                                         The running example provided towards the end of the paper highlights how MAT-Builder’s main
                                         features allow users to easily generate multiple aspect trajectories, hence benefiting the mobility data
                                         analysis community.

                                         Keywords
                                         Spatio-temporal data mining, Semantic trajectories




1. Introduction and Motivations
The notion of multiple aspect trajectory (MAT) has been recently introduced in the literature [1]
to represent movement data (i.e., a moving object trajectory) that is heavily semantically enriched.
These trajectories can be seen as positioning data augmented with different dimensions, or
aspects, representing various types of semantic information that are relevant or contextual to the
data they are associated with. A few examples of aspects can be stops, i.e., parts of a trajectory
where the associated object stops its movement for some reason (e.g., an individual visiting a
point of interest), moves, i.e., parts of a trajectory where its object changes position, weather,
POIs, transportation means, activities performed, social media posts, and so on. The aspects
associated with a trajectory may be large in number, heterogeneous, and structurally complex.
   One might wonder which benefits of having such extremely enriched trajectories can provide.
First, we observe that enriched trajectories may allow to better understand object movements
SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
" chiara.pugliese@isti.cnr.it (C. Pugliese); francesco.lettich@isti.cnr.it (F. Lettich); chiara.renso@isti.cnr.it
(C. Renso); fabio.pinelli@imtlucca.it (F. Pinelli)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
�– for instance, animal behaviour analysis that uses enriched trajectories may allow to better
understand how environmental factors impact animal movements. Secondly, the availability of
contextual features can help in the prediction or recommendation of itineraries – for instance, a
tourist recommending system may take advantage of contextual information to better tailor
personalized suggestions.
   While there are already contributions to the modelling and analysis of semantic trajectories
and multiple aspects trajectories [2], the amount of datasets available in the community is still
scarce. This is mainly caused by privacy concerns and regulations that make the publication of
multiple aspect trajectory datasets difficult. Moreover, even when the data to be enriched is not
strictly privacy-sensitive, we observe a general lack of tools that can support users during the
complex process of creating these datasets. Generally speaking, this process requires to identify
(1) the parts of the trajectory to be enriched, (2) the various types and sources of semantic data to
the be used for the enrichment, and (3) the most suitable approaches to properly associate spatio-
temporal data with semantic information. For what concerns existing solutions, there are several
libraries (e.g., Geopandas1 , [3, 4]), dashboards (e.g., [5, 6]), and ontology-based approaches (e.g.,
[7]) that are able to process and extract insights from trajectory data. Unfortunately, these
solutions either do not perform semantic enrichment or, when they do, they are limited to a
fixed number of aspects, are not extensible, or they do not support the use of external data
sources.
   In this work we aim to address the above challenges and issues by introducing MAT-Builder,
an interactive system that supports practitioners during the whole trajectory semantic enrich-
ment process and that enables users to generate datasets of multiple aspects trajectories.
   The main novel contributions of this work are as follows: (1) we provide a system that supports
the user in the complex process of building multiple aspect trajectories from heterogeneous
data sources and with different semantic aspects, (2) this system is able to extract, combine,
and build enriching information from a variety of external data sources (e.g., OpenStreetMap2 ),
and (3) it is designed with modularity and extensibility in mind, thus allowing developers to
add new aspects, external data sources, and functionalities. It is important to point out that we
designed MAT-Builder to reuse functionalities provided by existing mobility data management
libraries, including some of those that we already mentioned and referred above, i.e., GeoPandas,
Scikit-mobility [3], and PTrail [4].
   The rest of the paper is organized as follows. In Section 2 we present some fundamental
notions which will be then used in the subsequent sections. Section 3 presents the modular
architecture of MAT-Builder. Section 4 presents a running example highlighting the main
features and benefits that our system brings to the mobility data analysis community. Finally,
Section 5 draws some final conclusions. The full version of this paper has been published as a
demo paper in [8].




    1
        https://geopandas.org/
    2
        https://www.openstreetmap.org/
�2. Preliminaries
In this section we introduce some fundamental notions that will be used throughout the rest
of the paper. We define a raw trajectory to be a sequence of time-stamped spatial coordinates
representing the movement of an object. We define an enriched trajectory to be a raw trajectory
enriched with semantic information. A multiple aspect trajectory is an enriched trajectory
with multiple types, or aspects, of semantic information (e.g., weather, transportation means,
POIs) [1]. From here on we will use the two terms above interchangeably to refer to multiple
aspects trajectories. A segmented trajectory is a raw trajectory partitioned into sub-trajectories,
or segments, according to some criteria. A common example of segmented trajectory is that
obtained with the stop (i.e., the object is not moving) and move (the converse) segmentation
[9]. The segment enrichment task associates to each segment one or more semantic aspects. For
instance, a stop segment can be enriched with the information of a nearby geographical object
(e.g., the point of interest aspect). Stops falling within the same area more than a given number
of times 𝜏 are considered a systematic stop. Common examples are a person staying at home or
at their workplace. Occasional stops can be then defined as stops that are not systematic.


3. The MAT-Builder system
The MAT-Builder system is written in Python and is made up of a user interface (UI) and,
the core component of the system, the modular backend 3 . The UI exposes the MAT-Builder’s
functionalities to the users, and translates the users’ needs into queries that are then processed
by the backend. We postpone the UI illustration to the running example (Section 4). The backend
is the core component of our system as it represents the MAT-Builder’s query processing
engine. Following the process described in [10], we designed the backend to include three main
modules: trajectory pre-processing, trajectory segmentation, and segment enrichment. Each
module provides a subset of the system functionalities. Figure 1 reports an overview of the
modules (and the underlying information flow) that the system currently provides.
   It is important to highlight that the backend is designed to be extensible with new modules, thus
allowing developers to provide additional functionalities (e.g., additional aspects or management
of further external sources) to the system. Indeed, every module in the system must extend a
common interface that specifies the methods that every module should implement in order to be
integrated and used within the system. More precisely, the interface requires every module to
implement the following methods: (1) one that sets in the UI its input parameters, (2) one that
executes the module operations according to the input parameters, and (3) one that specifies how
the results should be provided via the UI. Note that any new module may also be built on top of
preexisting modules via subclassing. Finally, our system provides data workflow management
capabilities that let the user specify which modules shall be used among those available, and
the order in which these shall be executed.
   In the following we discuss the general goals of each backend module, and provide some
details concerning their current implementation. Furthermore, we explain how the modules are
connected to each other through intermediate saved results.
   3
       Upon acceptance of the paper we will provide a github link to our software and the datasets we used.
�                                        Raw trajectories


                                                                                                           Enrichment data sources



             Trajectories Preprocessing
                                                            Occasional stop                                  Temporal
                                                                                      Nearest POIs                              Rank
             Trajectories Segmentation                       enrichment                                       criteria


                                                            Systematic stop
               Segments Enrichment                            enrichment
                                                                                      Home/Work/Other


               BACKEND MODULES                                  Moves                  Adding speed,                Transport mode
                                                              enrichment              acceleration, etc.               detection



                Rank
                POIs
                                          Systematic   Multiple-Aspect Trajectories
Systematic    Occasional                     stop
   stop          stop      Systematic
                              stop




Figure 1: MAT-Builder backend.


  The pre-processing module (blue block in Figure 1) takes in input a set of raw trajectories
and filters out noisy or unusable data to facilitate the activities of the other modules. One of
the functions of the pre-processing module is to discard trajectories that have an insufficient
sampling rate. The module also filters out the outliers by analysing their spatio-temporal
characteristics. Another functionality is the trajectory compression adopted from the scikit-
mobility library [3]. Note that trajectory compression can be critical, since it may drastically
reduce the computation time of other modules by feeding trajectories with fewer samples.
   The segmentation module (green block in Figure 1) takes in input a set of pre-processed
trajectories and partitions every trajectory into sub-trajectories (or segments). A well known
and widely used segmentation criterion is the stop and move [9]. Accordingly, this module
outputs a set of segmented trajectories which can be then further processed by other modules.
In the present version, the module makes use of the stop-move detection algorithm provided by
the scikit-mobility library [3].
   The segment enrichment module (yellow block in Figure 1) takes the output of the previous
module and identifies the different segments to enrich, the aspects to consider, the datasets to
be used to enrich the segments with different aspects, and the enrichment criteria. In MAT-
Builder’s current implementation this module has been divided into two sub-modules, one
dealing with the enrichment of stops and the other with the enrichment of moves.
   The stop enrichment sub-module enriches the stop segments with the regularity aspect,
i.e., it distinguishes between systematic and occasional stops. The geographical area of interest
�is discretized into a spatial grid by means of a Geohash function4 , which is then used to assign
each stop segment to a cell. Stops that fall within the same cell more than a given number of
times 𝜏 are considered to belong to the same systematic stop. Conversely, stops that do not
satisfy this criterion fall within the occasional category.
   Systematic stops are then further enriched with the activity aspect by inferring the activity
performed in it. The module is currently tailored on people’s movements, therefore classifying
systematic stops in home, work and others. The home activity enriches a systematic stop that
occurs in the night. Work, on the other hand, is an activity that enriches a systematic stop that
occurs on weekdays during working hours (i.e., a user defined time range). Systematic stops
that do not satisfy any of the above criteria are enriched as other.
   Occasional stops are harder to characterize than systematic stops, as they do not exhibit any
substantial spatio-temporal regularity. Thus, the module associates to each stop the points of
interest (POIs) that are located within a given range. Once the set of POIs have been identified,
they are ranked in some order of preference to select the best one to enrich the segment. In the
module current implementation POIs are ranked according to two criteria, i.e., their distance
from the stop and the overlap between their opening hours and the stop starting and ending
times (the latter criterion filters out POIs that are closed when the stop occurs). Notice that POIs
can be gathered either from Open Street Map or from other external data sources, assuming
that minimum information is provided (e.g., identifier, latitude, and longitude) and that it is
compliant with the required formatting criteria.
   The move enrichment sub-module focuses on enriching the move segments. In the present
version, the module enriches with two aspects: (1) quantitative numerical measures and (2)
transportation mean. The first aspect associates to each move quantitative numerical information
extracted from the underlying sub-trajectory, i.e., maximum and average speed, acceleration,
bearing rate, and total length. These information are computed via the PTrail library [4]. The
second aspect is enriched by inferring the transportation mean used during each move. To this
end, the module leverages a random-forest classifier (using the scikit-learn library [11]) trained
on the GeoLife dataset [12] to recognize the following transportation means: walk, car, bike,
bus, subway, and train.


4. Running example
The MAT-Builder system targets users who want to semantically enrich movement data,
possibly with many different aspects, and who would like to have control on how the various
enrichment tasks are conducted. Indeed, thanks to MAT-Builder’s highly flexible modular
design, users can decide which and how many aspects they want to consider to enrich their
trajectories. Furthermore, the operations conducted by the various MAT-Builder’s modules are
highly customizable, as a large set of modifiable parameters enable users to precisely control
and fine-tune the whole enrichment process. To show our system capabilities we introduce a
running example of use of the system though the Uniser Interface.
   The users will be able to interact with the system through the MAT-Builder’s UI at the

   4
       https://github.com/vinsci/geohash
�                 MAT-builder
                 Preprocessing    Segmentation      Segment enrichment   Dataset statistics

                  Upload your dataset                                    Tot. users: 181              Tot. trajectories: 301
                                                                                         No. of trajectories per year
                   ../path/to/yourfiles   Browse                         120
                                                                         100
                                                                          80
                 Customize pre-process steps                              60
                                                                          40
                  Min. points per trajectory       10     Insert
                                                                          20
                                                                             0
                  Max speed from the                                              2007         2008           2009      2010
                                                 300.0    Insert
                  previous point threshold
                  (km/h)                                                 USER
                                                                         1


                                                                         No. trajectories: 5

                                                                         Average duration of trajectories: 61min.

                                                                         Avarage length of treajectories: 719 m



Figure 2: MAT-Builder UI: pre-processing step


different steps of the enrichment process. The UI presents three tabs corresponding to the three
backend modules. In the pre-processing tab (Figure 2) the user will be able to select and input
the raw trajectory dataset they intend to enrich.
   Here the pre-processing tab lets the user customize some of the pre-processing operations,
i.e., they will be able to indicate the minimum number of points a trajectory should have and a
km/h threshold between two consecutive points the module uses to filter out outliers. Once the
raw trajectories have been pre-processed, the UI will show the results of this step.
   The user will then proceed to the trajectory segmentation tab (Figure 3). Here the interface
lets the user specify the minimum duration and the spatial radius the system will use to identify
the stop segments (and, indirectly, the move ones). Once the trajectories have been segmented,
the UI will again show a few statistics about this step.
   The user will finally proceed to the segment enrichment tab (Figure 4), where they will be able
to enrich the stop and move segments with different aspects. The first aspect added to the stop


            MAT-builder
            Preprocessing     Segmentation          Segment enrichment
                                                                                 USER
                                                                                 1
            Customize segmentation
                                                                                 No. trajectories: 5
             Min. duration of a stop               10     Insert
             (minutes)                                                           No. stops: 4
             Max. spatial search               300.0      Insert
             radius (km)                                                         Avarage duration of stops: 23 min.




Figure 3: MAT-Builder UI: segmentation step
�           MAT-builder
                                                                     USER
           Preprocessing       Segmentation     Segment enrichment   1


            Download POIs from OSM:                                  No. occasional stops: 13             No. systematic stops: 5
            Insert the bounding box
             North     South      East   West      Insert                   0h 33min          0h 0min                 0h 10min

            Semantic “granularity”         80.0      Insert
            (from 0 to 100)                                          TRAJECTORY
                                                                     166
            Or select your files:
                                                                         Building: 80%
                                                                         Cafè: 10%
             WEATHER       ../path/to/yourfiles    Browse                Restaurant: 10%
                                                                                                                 traj 166

                                                                                           Work: 78.9%           occasional
            Stops enrichment                                                               Home: 21.1%           stops

            Maximum distance from POIs            15.0      Insert                                               systematic
                                                                                           Airport: 80%          stop
            (meters)                                                                       Shop: 20%


            Moves enrichment
            Do you want to predict transport mode?




Figure 4: MAT-Builder UI: segment enrichment step


segments concerns their regularity, i.e., whether they belong to some type of systematic stop or
they are an occasional one. Next, the user will be able to select the data sources to be used to
enrich the occasional stops. Here, the user can provide a file for each aspect of interest – the
system currently supports POIs, weather, or social media posts. Moreover, MAT-Builder is able
to gather POIs from OpenStreetMap in the eventuality a POI file cannot be provided: in this
case, the user must first specify a bounding box of geographical coordinates from which the
POIs should be retrieved. POIs might have an extremely large number of attributes, and many
of these may have lots of missing values. To deal with this issue the UI lets the user specify a
value to discard attributes with too many missing values. With the POIs in place, the module
decides which POIs should be used to enrich the occasional stops by ranking them according to
the criteria described in Section 3 (i.e., distance and temporal overlap). Finally, the user will
be able to choose whether to enrich the moves with the transportation mean. This aspect is
estimated with a random-forest classifier as described in Section 3.


5. Conclusions and future work
In this paper we proposed MAT-Builder, a new system that supports users in creating mul-
tiple aspect trajectory datasets starting from raw trajectories and external data sources. The
semantic enrichment process offered by MAT-Builder includes trajectory pre-processing, tra-
jectory segmentation, and segment enrichment. The backend, which is the core component
of MAT-Builder, implements said process and it is currently instantiated with stop and move
segmentation, enrichment with systematic and occasional stops, activity inference, and trans-
portation mean estimation. A peculiar characteristic of MAT-Builder is that it is designed to
be modular and easily extensible which, in conjunction with MAT-Builder’s data workflow
management capabilites, allows users to easily set up their own semantic enrichment process.
�Acknowledgments
This work has been partially supported by the European Union’s Horizon 2020 Research and
Innovation program under the project MobiDataLab (GA 101006879) and MSCA project MASTER
(GA 777695).


References
 [1] R. Mello, V. Bogorny, L. O. Alvares, L. H. Z. Santana, C. A. Ferrero, A. A. Frozza, G. A.
     Schreiner, C. Renso, MASTER: A multiple aspect view on trajectories, Trans. GIS 23 (2019)
     805–822. URL: https://doi.org/10.1111/tgis.12526. doi:10.1111/tgis.12526.
 [2] C. Renso, V. Bogorny, K. Tserpes, S. Matwin, J. A. F. de Macêdo, Multiple-aspect analysis
     of semantic trajectories(master), Int. J. Geogr. Inf. Sci. 35 (2021) 763–766. URL: https:
     //doi.org/10.1080/13658816.2020.1870982. doi:10.1080/13658816.2020.1870982.
 [3] L. Pappalardo, F. Simini, G. Barlacchi, R. Pellungrini, scikit-mobility: A python li-
     brary for the analysis, generation and risk assessment of mobility data, arXiv preprint
     arXiv:1907.07062 (2019).
 [4] S. Haidri, Y. J. Haranwala, V. Bogorny, C. Renso, V. P. da Fonseca, A. Soares, Ptrail –
     a python package for parallel trajectory data preprocessing, CoRR (2021). URL: https:
     //arxiv.org/abs/2108.13202. arXiv:2108.13202.
 [5] M. Berlingerio, F. Calabrese, G. Di Lorenzo, R. Nair, F. Pinelli, M. L. Sbodio, Allaboard: A
     system for exploring urban mobility and optimizing public transport using cellphone data,
     in: Machine Learning and Knowledge Discovery in Databases, 2013, pp. 663–666.
 [6] F. Calabrese, E. Cobelli, V. Ferraiuolo, G. Misseri, F. Pinelli, D. Rodriguez, Using vodafone
     mobile phone network data to provide insights into citizens mobility in italy during the
     coronavirus outbreak, Data & Policy 3 (2021) e22.
 [7] T. P. Nogueira, R. B. Braga, C. T. de Oliveira, H. Martin, Framestep: A framework for
     annotating semantic trajectories based on episodes, Expert Systems with Applications 92
     (2018) 533–545.
 [8] C. Pugliese, F. Lettich, C. Renso, F. Pinelli, Mat-builder: a system to build semantically
     enriched trajectories, in: The 23rd IEEE International Conference on Mobile Data Manage-
     ment, Cyprus, 2022.
 [9] S. Spaccapietra, C. Parent, M. L. Damiani, J. A. de Macedo, F. Porto, C. Vangenot, A
     conceptual view on trajectories, Data & knowledge engineering 65 (2008) 126–146.
[10] A. Ibrahim, H. Zhang, S. Clinch, S. Harper, From GPS to semantic data: how and why - a
     framework for enriching smartphone trajectories, Computing 103 (2021) 2763–2787. URL:
     https://doi.org/10.1007/s00607-021-00993-z. doi:10.1007/s00607-021-00993-z.
[11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
     P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,
     M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine
     Learning Research 12 (2011) 2825–2830.
[12] Y. Zheng, X. Xie, W. Ma, et al., Geolife: A collaborative social networking service among
     user, location and trajectory., IEEE Data Eng. Bull. 33 (2010) 32–39.
�