Vol-3194/paper21
Jump to navigation
Jump to search
Paper
Paper | |
---|---|
edit | |
description | |
id | Vol-3194/paper21 |
wikidataid | →Q117345056 |
title | MAT-Builder: a System to Build Semantically Enriched Trajectories |
pdfUrl | https://ceur-ws.org/Vol-3194/paper21.pdf |
dblpUrl | https://dblp.org/rec/conf/sebd/PuglieseLRP22 |
volume | Vol-3194→Vol-3194 |
session | → |
MAT-Builder: a System to Build Semantically Enriched Trajectories
MAT-Builder: a System to Build Semantically Enriched Trajectories Chiara Pugliese1,2 , Francesco Lettich1 , Chiara Renso1 and Fabio Pinelli3 1 ISTI-CNR, Pisa, Italy 2 Department of Computer Science, University of Pisa 3 IMT School for Advanced Studies, Lucca, Italy Abstract The notion of multiple aspect trajectory (MAT) has been recently introduced in the literature to represent movement data that is heavily semantically enriched with dimensions (aspects) representing various types of semantic information (e.g., stops, moves, weather, traffic, events, and points of interest). Aspects may be large in number, heterogeneous, or structurally complex. Although there is a growing volume of literature addressing the modelling and analysis of multiple aspect trajectories, the community suffers from a general lack of publicly available datasets. This is due to privacy concerns that make it difficult to publish such type of data, and to the lack of tools that are capable of linking raw spatio-temporal data to different types of semantic contextual data. In this work we aim to address this last issue by presenting MAT-Builder, a system that not only supports users during the whole semantic enrichment process, but also allows the use of a variety of external data sources. Furthermore, MAT-Builder has been designed with modularity and extensibility in mind, thus enabling practitioners to easily add new functionalities. The running example provided towards the end of the paper highlights how MAT-Builder’s main features allow users to easily generate multiple aspect trajectories, hence benefiting the mobility data analysis community. Keywords Spatio-temporal data mining, Semantic trajectories 1. Introduction and Motivations The notion of multiple aspect trajectory (MAT) has been recently introduced in the literature [1] to represent movement data (i.e., a moving object trajectory) that is heavily semantically enriched. These trajectories can be seen as positioning data augmented with different dimensions, or aspects, representing various types of semantic information that are relevant or contextual to the data they are associated with. A few examples of aspects can be stops, i.e., parts of a trajectory where the associated object stops its movement for some reason (e.g., an individual visiting a point of interest), moves, i.e., parts of a trajectory where its object changes position, weather, POIs, transportation means, activities performed, social media posts, and so on. The aspects associated with a trajectory may be large in number, heterogeneous, and structurally complex. One might wonder which benefits of having such extremely enriched trajectories can provide. First, we observe that enriched trajectories may allow to better understand object movements SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy " chiara.pugliese@isti.cnr.it (C. Pugliese); francesco.lettich@isti.cnr.it (F. Lettich); chiara.renso@isti.cnr.it (C. Renso); fabio.pinelli@imtlucca.it (F. Pinelli) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) �– for instance, animal behaviour analysis that uses enriched trajectories may allow to better understand how environmental factors impact animal movements. Secondly, the availability of contextual features can help in the prediction or recommendation of itineraries – for instance, a tourist recommending system may take advantage of contextual information to better tailor personalized suggestions. While there are already contributions to the modelling and analysis of semantic trajectories and multiple aspects trajectories [2], the amount of datasets available in the community is still scarce. This is mainly caused by privacy concerns and regulations that make the publication of multiple aspect trajectory datasets difficult. Moreover, even when the data to be enriched is not strictly privacy-sensitive, we observe a general lack of tools that can support users during the complex process of creating these datasets. Generally speaking, this process requires to identify (1) the parts of the trajectory to be enriched, (2) the various types and sources of semantic data to the be used for the enrichment, and (3) the most suitable approaches to properly associate spatio- temporal data with semantic information. For what concerns existing solutions, there are several libraries (e.g., Geopandas1 , [3, 4]), dashboards (e.g., [5, 6]), and ontology-based approaches (e.g., [7]) that are able to process and extract insights from trajectory data. Unfortunately, these solutions either do not perform semantic enrichment or, when they do, they are limited to a fixed number of aspects, are not extensible, or they do not support the use of external data sources. In this work we aim to address the above challenges and issues by introducing MAT-Builder, an interactive system that supports practitioners during the whole trajectory semantic enrich- ment process and that enables users to generate datasets of multiple aspects trajectories. The main novel contributions of this work are as follows: (1) we provide a system that supports the user in the complex process of building multiple aspect trajectories from heterogeneous data sources and with different semantic aspects, (2) this system is able to extract, combine, and build enriching information from a variety of external data sources (e.g., OpenStreetMap2 ), and (3) it is designed with modularity and extensibility in mind, thus allowing developers to add new aspects, external data sources, and functionalities. It is important to point out that we designed MAT-Builder to reuse functionalities provided by existing mobility data management libraries, including some of those that we already mentioned and referred above, i.e., GeoPandas, Scikit-mobility [3], and PTrail [4]. The rest of the paper is organized as follows. In Section 2 we present some fundamental notions which will be then used in the subsequent sections. Section 3 presents the modular architecture of MAT-Builder. Section 4 presents a running example highlighting the main features and benefits that our system brings to the mobility data analysis community. Finally, Section 5 draws some final conclusions. The full version of this paper has been published as a demo paper in [8]. 1 https://geopandas.org/ 2 https://www.openstreetmap.org/ �2. Preliminaries In this section we introduce some fundamental notions that will be used throughout the rest of the paper. We define a raw trajectory to be a sequence of time-stamped spatial coordinates representing the movement of an object. We define an enriched trajectory to be a raw trajectory enriched with semantic information. A multiple aspect trajectory is an enriched trajectory with multiple types, or aspects, of semantic information (e.g., weather, transportation means, POIs) [1]. From here on we will use the two terms above interchangeably to refer to multiple aspects trajectories. A segmented trajectory is a raw trajectory partitioned into sub-trajectories, or segments, according to some criteria. A common example of segmented trajectory is that obtained with the stop (i.e., the object is not moving) and move (the converse) segmentation [9]. The segment enrichment task associates to each segment one or more semantic aspects. For instance, a stop segment can be enriched with the information of a nearby geographical object (e.g., the point of interest aspect). Stops falling within the same area more than a given number of times 𝜏 are considered a systematic stop. Common examples are a person staying at home or at their workplace. Occasional stops can be then defined as stops that are not systematic. 3. The MAT-Builder system The MAT-Builder system is written in Python and is made up of a user interface (UI) and, the core component of the system, the modular backend 3 . The UI exposes the MAT-Builder’s functionalities to the users, and translates the users’ needs into queries that are then processed by the backend. We postpone the UI illustration to the running example (Section 4). The backend is the core component of our system as it represents the MAT-Builder’s query processing engine. Following the process described in [10], we designed the backend to include three main modules: trajectory pre-processing, trajectory segmentation, and segment enrichment. Each module provides a subset of the system functionalities. Figure 1 reports an overview of the modules (and the underlying information flow) that the system currently provides. It is important to highlight that the backend is designed to be extensible with new modules, thus allowing developers to provide additional functionalities (e.g., additional aspects or management of further external sources) to the system. Indeed, every module in the system must extend a common interface that specifies the methods that every module should implement in order to be integrated and used within the system. More precisely, the interface requires every module to implement the following methods: (1) one that sets in the UI its input parameters, (2) one that executes the module operations according to the input parameters, and (3) one that specifies how the results should be provided via the UI. Note that any new module may also be built on top of preexisting modules via subclassing. Finally, our system provides data workflow management capabilities that let the user specify which modules shall be used among those available, and the order in which these shall be executed. In the following we discuss the general goals of each backend module, and provide some details concerning their current implementation. Furthermore, we explain how the modules are connected to each other through intermediate saved results. 3 Upon acceptance of the paper we will provide a github link to our software and the datasets we used. � Raw trajectories Enrichment data sources Trajectories Preprocessing Occasional stop Temporal Nearest POIs Rank Trajectories Segmentation enrichment criteria Systematic stop Segments Enrichment enrichment Home/Work/Other BACKEND MODULES Moves Adding speed, Transport mode enrichment acceleration, etc. detection Rank POIs Systematic Multiple-Aspect Trajectories Systematic Occasional stop stop stop Systematic stop Figure 1: MAT-Builder backend. The pre-processing module (blue block in Figure 1) takes in input a set of raw trajectories and filters out noisy or unusable data to facilitate the activities of the other modules. One of the functions of the pre-processing module is to discard trajectories that have an insufficient sampling rate. The module also filters out the outliers by analysing their spatio-temporal characteristics. Another functionality is the trajectory compression adopted from the scikit- mobility library [3]. Note that trajectory compression can be critical, since it may drastically reduce the computation time of other modules by feeding trajectories with fewer samples. The segmentation module (green block in Figure 1) takes in input a set of pre-processed trajectories and partitions every trajectory into sub-trajectories (or segments). A well known and widely used segmentation criterion is the stop and move [9]. Accordingly, this module outputs a set of segmented trajectories which can be then further processed by other modules. In the present version, the module makes use of the stop-move detection algorithm provided by the scikit-mobility library [3]. The segment enrichment module (yellow block in Figure 1) takes the output of the previous module and identifies the different segments to enrich, the aspects to consider, the datasets to be used to enrich the segments with different aspects, and the enrichment criteria. In MAT- Builder’s current implementation this module has been divided into two sub-modules, one dealing with the enrichment of stops and the other with the enrichment of moves. The stop enrichment sub-module enriches the stop segments with the regularity aspect, i.e., it distinguishes between systematic and occasional stops. The geographical area of interest �is discretized into a spatial grid by means of a Geohash function4 , which is then used to assign each stop segment to a cell. Stops that fall within the same cell more than a given number of times 𝜏 are considered to belong to the same systematic stop. Conversely, stops that do not satisfy this criterion fall within the occasional category. Systematic stops are then further enriched with the activity aspect by inferring the activity performed in it. The module is currently tailored on people’s movements, therefore classifying systematic stops in home, work and others. The home activity enriches a systematic stop that occurs in the night. Work, on the other hand, is an activity that enriches a systematic stop that occurs on weekdays during working hours (i.e., a user defined time range). Systematic stops that do not satisfy any of the above criteria are enriched as other. Occasional stops are harder to characterize than systematic stops, as they do not exhibit any substantial spatio-temporal regularity. Thus, the module associates to each stop the points of interest (POIs) that are located within a given range. Once the set of POIs have been identified, they are ranked in some order of preference to select the best one to enrich the segment. In the module current implementation POIs are ranked according to two criteria, i.e., their distance from the stop and the overlap between their opening hours and the stop starting and ending times (the latter criterion filters out POIs that are closed when the stop occurs). Notice that POIs can be gathered either from Open Street Map or from other external data sources, assuming that minimum information is provided (e.g., identifier, latitude, and longitude) and that it is compliant with the required formatting criteria. The move enrichment sub-module focuses on enriching the move segments. In the present version, the module enriches with two aspects: (1) quantitative numerical measures and (2) transportation mean. The first aspect associates to each move quantitative numerical information extracted from the underlying sub-trajectory, i.e., maximum and average speed, acceleration, bearing rate, and total length. These information are computed via the PTrail library [4]. The second aspect is enriched by inferring the transportation mean used during each move. To this end, the module leverages a random-forest classifier (using the scikit-learn library [11]) trained on the GeoLife dataset [12] to recognize the following transportation means: walk, car, bike, bus, subway, and train. 4. Running example The MAT-Builder system targets users who want to semantically enrich movement data, possibly with many different aspects, and who would like to have control on how the various enrichment tasks are conducted. Indeed, thanks to MAT-Builder’s highly flexible modular design, users can decide which and how many aspects they want to consider to enrich their trajectories. Furthermore, the operations conducted by the various MAT-Builder’s modules are highly customizable, as a large set of modifiable parameters enable users to precisely control and fine-tune the whole enrichment process. To show our system capabilities we introduce a running example of use of the system though the Uniser Interface. The users will be able to interact with the system through the MAT-Builder’s UI at the 4 https://github.com/vinsci/geohash � MAT-builder Preprocessing Segmentation Segment enrichment Dataset statistics Upload your dataset Tot. users: 181 Tot. trajectories: 301 No. of trajectories per year ../path/to/yourfiles Browse 120 100 80 Customize pre-process steps 60 40 Min. points per trajectory 10 Insert 20 0 Max speed from the 2007 2008 2009 2010 300.0 Insert previous point threshold (km/h) USER 1 No. trajectories: 5 Average duration of trajectories: 61min. Avarage length of treajectories: 719 m Figure 2: MAT-Builder UI: pre-processing step different steps of the enrichment process. The UI presents three tabs corresponding to the three backend modules. In the pre-processing tab (Figure 2) the user will be able to select and input the raw trajectory dataset they intend to enrich. Here the pre-processing tab lets the user customize some of the pre-processing operations, i.e., they will be able to indicate the minimum number of points a trajectory should have and a km/h threshold between two consecutive points the module uses to filter out outliers. Once the raw trajectories have been pre-processed, the UI will show the results of this step. The user will then proceed to the trajectory segmentation tab (Figure 3). Here the interface lets the user specify the minimum duration and the spatial radius the system will use to identify the stop segments (and, indirectly, the move ones). Once the trajectories have been segmented, the UI will again show a few statistics about this step. The user will finally proceed to the segment enrichment tab (Figure 4), where they will be able to enrich the stop and move segments with different aspects. The first aspect added to the stop MAT-builder Preprocessing Segmentation Segment enrichment USER 1 Customize segmentation No. trajectories: 5 Min. duration of a stop 10 Insert (minutes) No. stops: 4 Max. spatial search 300.0 Insert radius (km) Avarage duration of stops: 23 min. Figure 3: MAT-Builder UI: segmentation step � MAT-builder USER Preprocessing Segmentation Segment enrichment 1 Download POIs from OSM: No. occasional stops: 13 No. systematic stops: 5 Insert the bounding box North South East West Insert 0h 33min 0h 0min 0h 10min Semantic “granularity” 80.0 Insert (from 0 to 100) TRAJECTORY 166 Or select your files: Building: 80% Cafè: 10% WEATHER ../path/to/yourfiles Browse Restaurant: 10% traj 166 Work: 78.9% occasional Stops enrichment Home: 21.1% stops Maximum distance from POIs 15.0 Insert systematic Airport: 80% stop (meters) Shop: 20% Moves enrichment Do you want to predict transport mode? Figure 4: MAT-Builder UI: segment enrichment step segments concerns their regularity, i.e., whether they belong to some type of systematic stop or they are an occasional one. Next, the user will be able to select the data sources to be used to enrich the occasional stops. Here, the user can provide a file for each aspect of interest – the system currently supports POIs, weather, or social media posts. Moreover, MAT-Builder is able to gather POIs from OpenStreetMap in the eventuality a POI file cannot be provided: in this case, the user must first specify a bounding box of geographical coordinates from which the POIs should be retrieved. POIs might have an extremely large number of attributes, and many of these may have lots of missing values. To deal with this issue the UI lets the user specify a value to discard attributes with too many missing values. With the POIs in place, the module decides which POIs should be used to enrich the occasional stops by ranking them according to the criteria described in Section 3 (i.e., distance and temporal overlap). Finally, the user will be able to choose whether to enrich the moves with the transportation mean. This aspect is estimated with a random-forest classifier as described in Section 3. 5. Conclusions and future work In this paper we proposed MAT-Builder, a new system that supports users in creating mul- tiple aspect trajectory datasets starting from raw trajectories and external data sources. The semantic enrichment process offered by MAT-Builder includes trajectory pre-processing, tra- jectory segmentation, and segment enrichment. The backend, which is the core component of MAT-Builder, implements said process and it is currently instantiated with stop and move segmentation, enrichment with systematic and occasional stops, activity inference, and trans- portation mean estimation. A peculiar characteristic of MAT-Builder is that it is designed to be modular and easily extensible which, in conjunction with MAT-Builder’s data workflow management capabilites, allows users to easily set up their own semantic enrichment process. �Acknowledgments This work has been partially supported by the European Union’s Horizon 2020 Research and Innovation program under the project MobiDataLab (GA 101006879) and MSCA project MASTER (GA 777695). References [1] R. Mello, V. Bogorny, L. O. Alvares, L. H. Z. Santana, C. A. Ferrero, A. A. Frozza, G. A. Schreiner, C. Renso, MASTER: A multiple aspect view on trajectories, Trans. GIS 23 (2019) 805–822. URL: https://doi.org/10.1111/tgis.12526. doi:10.1111/tgis.12526. [2] C. Renso, V. Bogorny, K. Tserpes, S. Matwin, J. A. F. de Macêdo, Multiple-aspect analysis of semantic trajectories(master), Int. J. Geogr. Inf. Sci. 35 (2021) 763–766. URL: https: //doi.org/10.1080/13658816.2020.1870982. doi:10.1080/13658816.2020.1870982. [3] L. Pappalardo, F. Simini, G. Barlacchi, R. Pellungrini, scikit-mobility: A python li- brary for the analysis, generation and risk assessment of mobility data, arXiv preprint arXiv:1907.07062 (2019). [4] S. Haidri, Y. J. Haranwala, V. Bogorny, C. Renso, V. P. da Fonseca, A. Soares, Ptrail – a python package for parallel trajectory data preprocessing, CoRR (2021). URL: https: //arxiv.org/abs/2108.13202. arXiv:2108.13202. [5] M. Berlingerio, F. Calabrese, G. Di Lorenzo, R. Nair, F. Pinelli, M. L. Sbodio, Allaboard: A system for exploring urban mobility and optimizing public transport using cellphone data, in: Machine Learning and Knowledge Discovery in Databases, 2013, pp. 663–666. [6] F. Calabrese, E. Cobelli, V. Ferraiuolo, G. Misseri, F. Pinelli, D. Rodriguez, Using vodafone mobile phone network data to provide insights into citizens mobility in italy during the coronavirus outbreak, Data & Policy 3 (2021) e22. [7] T. P. Nogueira, R. B. Braga, C. T. de Oliveira, H. Martin, Framestep: A framework for annotating semantic trajectories based on episodes, Expert Systems with Applications 92 (2018) 533–545. [8] C. Pugliese, F. Lettich, C. Renso, F. Pinelli, Mat-builder: a system to build semantically enriched trajectories, in: The 23rd IEEE International Conference on Mobile Data Manage- ment, Cyprus, 2022. [9] S. Spaccapietra, C. Parent, M. L. Damiani, J. A. de Macedo, F. Porto, C. Vangenot, A conceptual view on trajectories, Data & knowledge engineering 65 (2008) 126–146. [10] A. Ibrahim, H. Zhang, S. Clinch, S. Harper, From GPS to semantic data: how and why - a framework for enriching smartphone trajectories, Computing 103 (2021) 2763–2787. URL: https://doi.org/10.1007/s00607-021-00993-z. doi:10.1007/s00607-021-00993-z. [11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830. [12] Y. Zheng, X. Xie, W. Ma, et al., Geolife: A collaborative social networking service among user, location and trajectory., IEEE Data Eng. Bull. 33 (2010) 32–39. �