Vol-3194/paper32
Jump to navigation
Jump to search
Paper
Paper | |
---|---|
edit | |
description | |
id | Vol-3194/paper32 |
wikidataid | Q117344923→Q117344923 |
title | A Topological Perspective of Port Networks From Three Years (2017-2019) of AIS Data |
pdfUrl | https://ceur-ws.org/Vol-3194/paper32.pdf |
dblpUrl | https://dblp.org/rec/conf/sebd/CarliniLSEBM22 |
volume | Vol-3194→Vol-3194 |
session | → |
A Topological Perspective of Port Networks From Three Years (2017-2019) of AIS Data
A Topological Perspective of Port Networks From Three Years (2017-2019) of AIS Data Emanuele Carlini1 , Vinicius Monteiro de Lira1 , Amilcar Soares2 , Mohammad Etemad3 , Bruno Brandoli3 and Stan Matwin3 1 Institute of Information Science and Technologies (ISTI), National Research Council (CNR), Pisa, Italy 2 Department of Computer Science, Memorial University of Newfoundland, St. John’s, Canada 3 Institute for Big Data Analytics, Dalhousie University, Halifax, Canada Abstract Complex network analysis is a fundamental tool to understand non-trivial aspects of graphs and networks and is widely used in many fields. In this paper, we apply complex network techniques to study port networks, in which nodes are ports and edges are maritime lines between ports. In particular, we study the temporal evolution of several topological features of a network of ports, including connected components, shortest paths, and clustering coefficients. We built the network with three years of Automatic Identification System data from 2017 to 2019. We highlight several interesting trends and behaviors that differentiate long-range vessels from short-range vessels. Keywords Automatic Identification System, Graph Analysis, Time series 1. Introduction The analysis of maritime data is a well-established source of information to understand the role of maritime routes in economic, social, and environmental contexts. Recent works propose to use data analytics and complex network tools to find routes [1], extract high-level representations and evaluate local maritime traffic [2], and integrate maritime date with other environmental data [3]. Such studies often employs the modelling of the relationship between vessels and ports as a network. Such a network, which we call Port Network, represents sea ports as nodes and edges as maritime lines connecting two ports. Since the introduction of the Automatic Identification System (AIS) for vessels, there has been a surge of studies on the Port Networks [4, 5]. Port Networks are usually analyzed with tools that are typical of complex networks, which allow to compute non-trivial topological features of the network. However, only few works study the network in terms of its evolution [6, 7]. Also, those works that studied the network evolution used private data and performed interesting but high-level and coarse-grained analysis, such as in [8]. SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy $ emanuele.carlini@isti.cnr.it (E. Carlini); vinicius.monteirodelira@isti.cnr.it (V. M. d. Lira); amilcarsj@mun.ca (A. Soares); etemad@dal.ca (M. Etemad); brunobrandoli@dal.ca (B. Brandoli); stan@cs.dal.ca (S. Matwin) � 0000-0003-3643-5404 (E. Carlini); 0000-0002-7580-1756 (V. M. d. Lira); 0000-0001-5957-3805 (A. Soares); 0000-0002-3770-180X (M. Etemad); 0000-0001-6629-8434 (S. Matwin) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) � Data Analysis Data Reference source over time scope Kaluza et. al, 2010 [11] Lloyd’s ✗ Global González Laxe et al., 2012 [6] Lloyd’s ✗ Local Kosowska-Stamirowska et al., 2016 [12] Lloyd’s ✓ Global Ducruet, 2017 [8] Lloyd’s ✓ Global Coscia et al., 2018 [13] AIS ✗ Local Wang et al., 2019 [4] AIS ✗ Global Table 1 A summary of the related works regarding the four evaluated aspects. The main goal of our analysis is to provide an overview on the evolution and trends of several Port Network topological features, by consider the two necessary dimensions of time and layers (i.e., the evolution of the network can be observed for multiple types of vessels, such as cargo and passengers). Networks can be analysed by looking at the local features of their main components (i.e., nodes and edges), or by looking at features of the network as a whole. In this paper we are interested in the latter. We analyzed 3 years of world-wide AIS data to investigate Port Networks in terms of topological features such as connected components, shortest paths, and clustering coefficients. From our analysis we observed that long range vessels, such as cargoes and tanker vessels, tend to form well-connected large networks that are relatively stable over time, while short range vessel (i.e., passenger, fishing) form small well-connected networks with a lot of variability over time. The content of this discussion paper is based on other manuscripts [9, 10] already published by the same authors, which contains an extensive related work, a detailed methodology, and a correlation and stationarity analysis of the network measures. With respect to [9] however, this paper provide the analysis on a cleaner and improved AIS dataset. 2. Related Work Table 1 summarizes how state of the art in the graph analysis with vessel data, in terms of source data (AIS or Lloyd’s database), whether the works evaluated the graph evolution over time, and the scope (local or global). The work done in [11] is one of the first to study the concept of Port Networks as a complex network. They use AIS information about the itineraries of 16363 ships of three types (bulk dry carriers, container ships, and oil tankers) during 2007 to build a network of links between ports. They show that the three categories of ships differ in their mobility patterns and networks. The work of [6] uses a sample of the Lloyds database with the world container ship fleet movements from Chinese ports from the years of 2008 to 2010. Their work aims to look at changes in the maritime network before and after the financial crisis (2008-2010) and analyze the extent to which large ports have seen their position within the network change. The authors show how � DATASETS VISITS & VOYAGES TIMESERIES CREATION Vessel type port networks timeseries AIS (M) Ports (P) filtering fj(G) Port visits (V) G1 Gn Messages Database type 1 PREPROCESSING G1 Gn fj(G) type 2 Record cleaning Port Voyages (R) corrupted, incomplete Areas G1 Gn fj(G) Voyage cleaning type k Spatial Filtering Impossible, incomplete kept only records inside port areas time Figure 1: Methodology for the creation of Port Networks and Time-series from AIS data the global and local importance of a port can be measured using graph theory concepts. A study of topological changes in the maritime trade network is shown in [12]. The authors propose two new measures of network navigability called random walk discovery and escape difficulty. Their results show that the maritime network evolves by increasing its navigability while doubling the number of active ports and the maritime network does not densify over time, and its effective diameter remains constant. In [8], the author investigates the degree of overlap among the different layers of circulation composing global maritime flows. The results show a strong and path-dependent influence of multiplexity on traffic volume, range of interaction, and centrality from various perspectives (e.g., matrices correlations, homophily, assortativity, and single linkage analysis). The work of [4] builds a Port Network using the 2015 AIS data of the world with multiple spatial levels. Their work evaluates features such as average degree and betweenness centrality of each node, average shortest path length between any two nodes, and community clusters of the GSNs. In a similar way, [13] presented an approach to learn automatically and represent compactly commercial maritime traffic in form of a graph, whose nodes represent clusters of waypoints, which are connected together by a network of navigational edges. 3. Creation of Port Networks and Timeseries This section describes our methodology to generate Port Networks and the correspondent Time-series. A visualization of the various steps is depicted in Figure 1. Datasets. To build the Port Networks we have used three years (2017-2019) of worldwide AIS data provided by ExactEarth [14]. The full dataset contains around 2.5 Terabytes of AIS messages and around 20 billions of records stored in a relational database. In all our analysis we consider the Maritime Mobile Service Identity (MMSI) as each vessel unique identifier. We used Python to develop several in-memory scripts to compute graphs topological features. Graphs are stored in memory as edge-lists. Since our focus is not on the performance of the processing, we have not performed a formal analysis for scalability. To model the ports we have used the World Port Index dataset [15] that contains spatial information, including latitude and longitude, of all known seaports in the world. The radius of the port area has been set to be 3 nautical miles (around 5 km). This value has been used to define country’s territorial waters limit [16]. �AIS data pre-processing. The aim of the pre-processing step is twofold. First, extracting the AIS messages that happened inside the area of a port. Depending on the radius, there could be overlapping port areas such that the same AIS record results transmitted inside multiple ports. In these cases, we discriminate by clustering the ports whose regions overlap and assign this cluster an unique port identifier. Messages are then spatial filtered with the clustered ports re- gions, in order to create a new set that contains only those messages transmitted inside the port areas. Second, removing incorrect, duplicated, and noisy messages. Incorrect messages are those syntactically valid but with invalid semantics in relevant fields (typically position or vessel type). Visits and Voyages. We define a visit by the continuous presence of a particular vessels in a port area. Multiple consecutive AIS messages in the same port area are considered as the same visit. By ordering the visits by time, we obtain a sequence of visits for each vessel. From the sequence of visits we extract the set of voyages. The underlying assumption is that given a sorted set of visits, we record a voyage from a origin to a destination port by observing the visit sequence of each vessel. The duration of a voyage is the time of the last visit of a vessel in the origin port and the time of first visit in the destination port. We then removed those voyages whose speed exceeds the capabilities of the vessels. Port Network and Time-series. From the sequences of visits, we create multiple Port Net- works, each considering a specific consecutive, non-overlapping time intervals. A Port Network is built by considering ports as nodes and the voyages as edges. The resulting network is a directed graph built by essentially collapsing a multi-graph into a directed graph. By extracting several topological features from each Port Network, we create a set of time-series to be able to study the evolution of the graphs using complex network concepts. The Time-series have been build with a time interval of a solar month, resulting in a total of 36 Time-series for each considered vessel type and each topological feature. 4. Port Networks Analysis Diverse types of vessels transmit AIS data, and it is natural to assume that the network of distinct types (layers) of vessels would be different. To identify the vessel type, we used the type field of the AIS data, and their associated description has been taken from the marinetraffic.com website1 (with minor modifications). We have considered only those vessel types (layers) having a relevant amount of unique vessels and voyages count, namely: (i) Cargo (37% unique vessel count, 6.6% unique voyages count); (ii) Tanker (15.7%, 3.3%); (iii) Passenger (3.84%, 3.13%); and (iv) Fishing (6%, 47.9%). We did not consider the special or the other types as they contain many different types of vessels; similarly, we did not consider tug tows as they usually perform very short voyages between nearby ports, and therefore are not interesting in a global world-wide analysis. The average orthodromic distance between all the edges of the graph (Figure 2), as similarly observed in [8], confirms that cargo and tanker perform longer voyages with respect to passenger and fishing vessels. Following these considerations we refer to cargo and tanker vessels as 1 https://help.marinetraffic.com/ � 2000 1800 1600 1400 cargo 1200 fishing tanker 1000 passenger 800 600 400 17 17 7 7 7 7 18 18 8 8 8 8 19 19 9 9 9 9 -20 pr-20 Jun-201 ug-201 ct-201 ec-201 eb-20 pr-20 Jun-201 ug-201 ct-201 ec-201 eb-20 pr-20 Jun-201 ug-201 ct-201 ec-201 Feb A A O D F A A O D F A A O D Figure 2: Average orthodromic distance in kilometers between nodes connected by an edge long-range vessels (LRV) due to their high average distances that variate few over time. In contrast, we refer to fishing and passenger vessels as short-range vessels (SRV) due to their low average distances that also show some variability. A relevant aspect in identifying cohesive subgroups of ports is the identification of those ports that share a strong tie in the traffic for a particular vessel type. The number of Strongly Connected Components (SCCs) is the number of subgraphs in which any node is reachable by any other nodes, and which is not connected to another subgraph [17]. Ideally, the number of SCC indicates how much the graph represents a global scale activity (low SCC number), rather than composed by a set of not connected and local activities (high SCC number). The average number of SCCs for the SRV and LRV networks in the 3-years period confirm this trend, with LRV having a lower number of SCCs in average (cargo: 153; fishing: 203; tanker: 141; passenger: 194). However, LRV networks are composed of a giant SCC that accounts for most of the nodes (>80%) on average over time, accompanied by many small components often composed by just two nodes (see Figure 4). As expected, nodes are more evenly distributed among the SCCs for SRV networks, in which the largest connected components account for just around 40% of the nodes on average for the passenger networks and around 20% for the fishing networks. From a geographic perspective, the LRV giant component spans world-wide. Those ports that remain out of the giant component show a seasonal trend with a clear difference from winter and summer periods (see Figure 3). The number of bidirectional edges (i.e. given the nodes 𝑢 and 𝑣, there exist both the edges [𝑢, 𝑣] and [𝑣, 𝑢]) can be used as an indication about network connectivity. A large fraction of bidirectional edges in a vessel network means tight interactions between ports, indicating vessels inter-exchange from most ports pairs. In LRV networks we notice a lower fraction of bidirectional edges, with around 70% of the ports connected only in one direction. By comparison, the SRV networks have a large fraction and are more variable (around 40% on average, see Figure 5). It is also interesting to notice how the values for the passenger networks form valleys during springs and autumns, while it peaks during summers and winters, indicating a seasonal change in the traffic patterns. LRV networks show a low fraction of bidirectional edges but a giant connected component: this suggests that LRVs are likely returning to the same set of ports but not directly, i.e., visiting other ports beforehand. This suggests that LRV traffic is mostly composed of unidirectional routes organised in ’circular’ patterns. These findings correspond with the results obtained by similar research works [11]. By comparison, in SRV � (a) January 2018 (b) July 2018 Figure 3: Ports in the giant connected components (blue circles) vs ports outside it (red crosses). During winter periods (left) several north-most areas are cut out from the giant component, such as in the Greenland or the Great Lakes of North America, whereas they are present during summer(right) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 cargo fishing 0.2 tanker passenger 0.1 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Figure 4: Fraction of nodes in the largest strongly connected component 0.425 cargo fishing tanker 0.400 passenger 0.375 0.350 0.325 0.300 0.275 0.250 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Figure 5: Fraction of bidirectional edges networks we observe many SSCs with an even distribution of vessel, and a higher symmetry, suggesting clusters of small local networks of predefined routes that are not connected to each other. The average shortest path in a graph is the minimum number of edges to traverse from a node origin to a node destination averaged on all pairs of nodes. In the vessel network, a lower average shortest path reveals more dense port connections. The average shortest path (computed on the giant connected component) of LRV networks is around 4 for tankers and cargo and is stable over time (see Figure 6). For SRV networks, the average shortest path is 4 for for fishing vessels, but with a much higher variability with respect to LRV networks. The average shortest path is relatively high and variable for passenger vessels, indicating a � 14 cargo fishing 12 tanker passenger 10 8 6 4 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Figure 6: Average shortest path. As a matter of comparison, for similar size random networks, we measured the following average shortest path: 2.7 for cargo, 3.5 for fishing, 2.9 for tanker, and 3.9 for passenger. 0.300 0.275 0.250 0.225 0.200 0.175 cargo fishing tanker 0.150 passenger 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr Jun Aug Oct Dec Figure 7: Average clustering coefficient. As a matter of comparison, for similar size random networks, we measured the following average clustering coefficient: 0.01 for cargo, 0.02 for fishing, 0.01 for tanker, and 0 for passenger. low-density graph affected by seasonal trends. However, the largest component in fishing networks is generally small compared to the number of nodes, so that such low values can be a direct consequence of that. The average clustering coefficient, is the average of local clustering coefficients of all nodes. The local clustering of a node is the fraction of triangles (set of 3 vertices such that any two of them are connected by an edge) that exist over all possible triangles in its neighborhood [17]. In other words, it can serve to evaluate how many voyages happen around the same set of ports. The results (Figure 7) show that cargo and tanker networks create networks of higher density with respect to passenger and fishing networks. The average clustering coefficient variability is high for all the type of vessels, but larger for SRV networks, and there is no noticeable pattern. Such variability indicates that most of the connections are indeed volatile and their existence can depend on specific local factors. 5. Conclusion This paper presented an analysis of the evolution of networks of voyages of vessels between ports, based on several Time-series of topological features of the Port Network. The networks were built in a bottom-up and data-driven fashion, considering three years of worldwide AIS data. The empirical evaluation of the Time-series shown that LRVs, such as cargos and tanker �vessels, tend to form well-connected giant strongly connected components that are relatively stable over time; by comparison, the SRVs behaviour is more variable over time and the resulting networks are more fragmented, with each component well-connected even if small. Acknowledgment The authors acknowledge the support of the H2020 EU Project MASTER (Multiple ASpects TrajEctoRy management and analysis) funded under the Marie Skłodowska-Curie grant agreement No 777695. References [1] D. Zissis, K. Chatzikokolakis, G. Spiliopoulos, M. Vodas, A Distributed Spatial Method for Modeling Maritime Routes, IEEE Access 8 (2020) 47556–47568. URL: https://ieeexplore.ieee.org/document/ 9028151/. doi:10.1109/ACCESS.2020.2979612. [2] D. Filipiak, K. We, W. Abramowicz, Extracting Maritime Traffic Networks from AIS Data Using Evolutionary Algorithm, Bus Inf Syst Eng (2020) 17. [3] A. Soares, R. Dividino, F. Abreu, M. Brousseau, A. W. Isenor, S. Webb, S. Matwin, Crisis: Integrating ais and ocean data streams using semantic web standards for event detection, in: 2019 International conference on military communications and information systems (ICMCIS), IEEE, 2019, pp. 1–7. [4] Z. Wang, C. Claramunt, Y. Wang, Extracting global shipping networks from massive historical automatic identification system sensor data: a bottom-up approach, Sensors 19 (2019) 3363. [5] I. Varlamis, I. Kontopoulos, K. Tserpes, M. Etemad, A. Soares, S. Matwin, Building navigation networks from multi-vessel trajectory data, GeoInformatica (2020) 1–29. [6] F. G. Laxe, M. J. F. Seoane, C. P. Montes, Maritime degree, centrality and vulnerability: port hierarchies and emerging areas in containerized transport (2008–2010), Journal of Transport Geography 24 (2012) 33–44. [7] C. P. Montes, M. J. F. Seoane, F. G. Laxe, General cargo and containership emergent routes: A complex networks description, Transport Policy 24 (2012) 126–140. [8] C. Ducruet, Multilayer dynamics of complex spatial networks: The case of global maritime flows (1977–2008), Journal of Transport Geography 60 (2017) 47–58. [9] E. Carlini, V. M. de Lira, A. Soares, M. Etemad, B. Brandoli, S. Matwin, Understanding evolution of maritime networks from automatic identification system data, GeoInformatica (2021) 1–25. [10] E. Carlini, V. M. de Lira, A. Soares, M. Etemad, B. B. Machado, S. Matwin, Uncovering vessel movement patterns from ais data with graph evolution analysis, in: EDBT/ICDT Workshops, 2020. [11] P. Kaluza, A. Kölzsch, M. T. Gastner, B. Blasius, The complex network of global cargo ship movements, Journal of the Royal Society Interface 7 (2010) 1093–1103. [12] Z. Kosowska-Stamirowska, C. Ducruet, N. Rai, Evolving structure of the maritime trade network: evidence from the lloyd’s shipping index (1890–2000), Journal of Shipping and Trade 1 (2016) 10. [13] P. Coscia, P. Braca, L. M. Millefiori, F. A. N. Palmieri, P. Willett, Multiple Ornstein–Uhlenbeck Pro- cesses for Maritime Traffic Graph Representation, IEEE Transactions on Aerospace and Electronic Systems 54 (2018) 2158–2170. doi:10.1109/TAES.2018.2808098. [14] exactearth.com, ExactEarth, last accessed July 2020. URL: https://www.exactearth.com/. [15] msi.nga.mil, World Port Index, last accessed July 2020. URL: https://msi.nga.mil/Publications/WPI. [16] G. H. Blake, Maritime boundaries, in: The Oceans: Key Issues in Marine Affairs, Springer, 2004, pp. 63–76. [17] J. M. Hernández, P. Van Mieghem, Classification of graph metrics, Delft University of Technology: Mekelweg, The Netherlands (2011) 1–20. �