Vol-3170/paper11
Jump to navigation
Jump to search
Paper
Paper | |
---|---|
edit | |
description | |
id | Vol-3170/paper11 |
wikidataid | Q117351471→Q117351471 |
title | Towards Automatic Extraction of Events for SON Modelling |
pdfUrl | https://ceur-ws.org/Vol-3170/paper11.pdf |
dblpUrl | https://dblp.org/rec/conf/apn/Alshammari22 |
volume | Vol-3170→Vol-3170 |
session | → |
Towards Automatic Extraction of Events for SON Modelling
Towards Automatic Extraction of Events for SON Modelling Tuwailaa Alshammari1 1 School of Computing, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, United Kingdom Abstract Data visualization is the process of transforming data into a visual representation in order to make it easier for human to comprehend and derive knowledge from. By offering a detailed overview of crime events, data visualization technologies have the potential to assist investigators in analysing crimes. This paper proposes a new model that takes advantage of statistical natural language processing technologies to extract people’s names and relevant events from crime documents for SON modelling and visualisation. The proposed extractor is examined and evaluated. It is argued that it achieved reasonable results when data is extracted for SON modelling when compared with human extraction. Keywords structured occurrence net, structured acyclic nets, communication structured acyclic net, natural language processing, event extraction, model building 1. Introduction Structured occurrence nets (SONs) [1, 2] are a Petri net based formalism for representing the be- haviour of complex systems consisting of interdependent subsystems which proceed concurrently and interact with each other. This extends the concept of an occurrence net, which represents a single ‘causal history’ and provides a full and unambiguous record of all causal dependencies between the events it involves. An example of a complex system is (cyber)crime and its computer based representation and analysis gained considerable research attention in recent years using, in particular, the SON model [3]. An extension of SONs are the communication structured acyclic nets (CSA-nets) [4] which are based on acyclic nets (ANs) rather than occurrence nets (ONs). A CSA-net joins together two or more ANs by employing buffer places to connect pairs of events from different ANs. The nature of such connections can be synchronous or asynchronous. In a synchronous communication, events are executed concurrently, whereas in asynchronous communication, events may be executed concurrently or sequentially. One of the main challenges in conducting effective criminal investigations is the overwhelming amount of data, which makes it challenging for investigators to comprehend the crime and, therefore, make decisions. In particular, investigators rely on a variety of sources of information during criminal investigations, including police written reports and witness statements, which may contain information that needs to be extracted and analysed. This is performed by connecting PNSE’22, International Workshop on Petri Nets and Software Engineering, Bergen, Norway, 2022 " t.t.t.alshammari2@ncl.ac.uk (T. Alshammari) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 188 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 and analysing the different aspects of the crime in order to comprehend the causality behind crime events. Natural Language Processing ( NLP ) can help in the analysis of such unstructured data sources and extraction of crime events. NLP began in the 1950s, according to [5]. With the invention of the computer, it became necessary to build human-machine interaction relationships in order to teach computers how to interpret real human language through manipulation and analysis of human text and speech. NLP is defined as a sub-field of Artificial Intelligence and Linguistics that focuses on teaching computers to recognise and interpret text, statements, and words written in human languages. NLP is now employed in a number of applications, including machine translation, sentiment analysis, and chatbots. As a result, several natural language processing tools and libraries have emerged in recent years, including C ORENLP, NLTK, and SPAC Y (used in the work presented here). SPAC Y [6] is an open-source natural language processing toolkit that was created to assist developers in implementing natural language processing annotations and tasks. It is a statistical model that excels at data extraction and text preparation for deep learning. As with other NLP libraries, SPAC Y has a variety of valuable linguistic features, including Part of Speech (POS) tagging, a dependency parser, and a Named Entity Recognizer (NER). This paper proposed integration of SONs with NLP in the extraction and modelling of crime events. The idea is to extract useful information from unstructured written sources in order to analyse and visualise in SONs. This paper is organised as follows. Section 2 provides an overview of research background on SON and NLP. Section 3 presents basic definitions concerning SONs. Section 4 presents the extraction and modelling of events using SONs done by human. We then introduce TEXT 2 SON, our proposed automatic extraction approach. Section 5 discusses and compares the results and shortcomings for both manual and automatic extraction. Section 6 concludes the paper and provides an overview of future work. 2. Background In this section, we look at the related work that have been used to attempt to solve the mentioned challenges through the use of graph analysis and visualisation, NLP, and data extraction for crime data. In particular, the work done in crime modelling in SONs as well as crime data extraction using various NLP techniques. SON s demonstrated promising results for accident, criminal and (cyber)crime investigations. [7] demonstrates an explicit capture of the accident behaviours for multiple sub-systems by modelling it in SONs. It showed the ability of SONs to aid an investigator in comprehending how the accident occurred and tracing the sequence of events leading up to the accident cause. Moreover, [3] suggests the use of SON features to detect DNS tunneling during an actual attack. A unique method for detecting DNS tunneling based on SONs has been created and implemented. Additionally, pre-processing of data and a set of experiments were discussed. The paper [8] introduced new WoPeD capabilities for integrating NLP and Business Processing. WoPeD (Petri net editor and simulator), is an open source Java application for building business processes using workflow nets. Algorithms have been presented for converting graphical process models into textual process descriptions and vice versa. However, the tool suffers from the 189 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 common issue of semantic ambiguity in natural language processing (in both directions). News and social media are taking considerable focus in information extraction and classifica- tion. [9] describes the development of a crime investigation tool that leverages Twitter data to aid criminal investigations, by providing contextual information about crime occurring in a certain location. A prototype has been implemented in the San Francisco region. This system provides users with a spatial view of criminal incidents and associated tweets in the area, allowing them to investigate the various tweets and crimes that occurred prior to and following a crime incident, as well as obtaining information about the spatial and temporal characteristics of a crime via the web. [10] presents data mining techniques, such as clustering, which have been shown to be useful in extracting insights from publicly available structured data (National Crime Records Bureau). Additionally, an approach for retrieving data via web scraping from news media has been presented, as well as the essential NLP techniques for extracting significant information that is not available through typical structured data sources. A continuing work on Simple Event Model ontology [11] discusses populating instances extracted from crime-related documents, aided by an SVO (Subject, Verb, Object) algorithm that extracts events using hand-crafted rules. The study employs the SVO algorithm to generate SVO triples by parsing crime-related words using the MALTPARSER 5 dependency parser and then extracting SVO triples from the parsed sentences. 3. Preliminaries In this section, we recall basic definitions of the SON model needed in the rest of the paper. Acyclic nets and occurrence nets A acyclic net is a ‘database’ of empirical facts (both actual and hypothetical expressed using places, transitions, and arcs linking them) accumulated during an investigation. Acyclic nets can represent alternative ways of interpreting what has happened, and so may exhibit (backward and forward) non-determinism. An example of acyclic net is occurrence nets, which provides a full and unambiguous record of all causal dependencies between the events it involves. An occurrence net represents a single ‘causal history’. Formally, an acyclic net is a triple acnet = (P, T, F) = (Pacnet , Tacnet , Facnet ), where P and T are disjoint sets of places and transitions respectively, and F ⊆ (P × T ) ∪ (T × P) is a flow relation such that F is acyclic, and, for every t ∈ T , there are p, q ∈ P such that pFt and tFq. Moreover, acnet is an occurrence net if, for each place p ∈ P, there is exactly one t ∈ T such that tF p, and exactly one u ∈ T such that pFu. An acyclic net is well-formed if for every step sequence starting from the default initial marking (i.e., the set of places without incoming arcs), no transition occurs more than once, and the sets of post-places of the transitions which have occurred are disjoint. Note that all occurrence nets are well-formed. 190 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 Communication structured acyclic nets A communication structured acyclic net consists of a number of disjoint acyclic nets which can communicate through special (buffer) places. CSA-nets may exhibit backward and forward non-determinism. They can contain cycles involving buffer places. Formally, a communication structured acyclic net (or CSA-net) is a tuple csan = (acnet1 , . . . , acnetn , Q,W ) (n ≥ 1) such that: 1. acnet1 , . . . , acnetn are well-formed acyclic nets with disjoint sets of nodes (i.e., places and transitions). We also denote: Pcsan = Pacnet1 ∪ · · · ∪ Pacnetn Tcsan = Tacnet1 ∪ · · · ∪ Tacnetn Fcsan = Facnet1 ∪ · · · ∪ Facnetn . 2. Q is a set of buffer places and W ⊆ (Q × Tcsan ) ∪ (Tcsan × Q) is a set of arcs adjacent to the buffer places satisfying the following: a) Q ∩ (Pcsan ∪ Tcsan ) = ∅. b) For every buffer place q: i. There is at least one transition t such that tW q. ii. If tW q and qWu then transitions t and u belong to different component acyclic nets. That is, in addition to requiring the disjointness of the component acyclic nets and the buffer places, it is required that buffer places pass tokens between different component acyclic nets. In the step semantics of CSA-nets, the role of the buffer places is special as they can ‘instantaneously’ pass tokens from transitions producing them to transitions needing them. In this way, cycles involving only the buffer places and transitions do not stop steps from being executable. A CSA-net csan = (acnet1 , . . . , acnetn , Q,W ) is a communication structured occurrence net (or CSO -net) if the following hold 1. The component acyclic nets are occurrence nets. 2. For every q ∈ Q, there is exactly one t ∈ Tcsan such that tW q, and exactly one u ∈ Tcsan such that qWu. 3. No place in Pcsan belongs to a cycle in the graph of Fcsan ∪W . That is, only cycles involving buffer places are allowed. All CSO-nets are well-formed in a sense similar to that of well-formed acyclic nets. As a result, they support clear notions of, in particular, causality and concurrency between transitions. In this paper, we use occurrence nets and CSO-nets rather than more general acyclic nets and CSA -nets. However, this will change in the future work when we move to the next stages of our work where alternative statements in textual documents are taken into account. 4. Extraction and Modelling Crime can be conceptualised as a complex evolving system characterised by the occurrence of numerous relevant and linked variables. Such systems require the examination and comprehension 191 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 of behaviour to assist investigators in the decision-making process. Investigators typically rely on a variety of sources, including written police reports and/or witness statements. CSA-nets provide a distinctive method for analysing such types of crimes via representing events and chain of events in order to uncover causal relationships between them. Also, CSA-nets can assist in better comprehension and visualisation of events. Our work relies on integrating NLP techniques with CSA-nets in order to extract useful information from written sources, and representing crime events through CSA-nets. This integration aims at the development of an automatic extraction tool (T EXT 2 SON ) for criminal cases leveraging statistical NLP models. 4.1. Human extraction: an experiment Extracting information from unstructured data, such as written investigation reports, aims to extract valuable information that could aid investigators in analysing and comprehending the dependencies between crime events. CSA-nets are one of the potential techniques for visually representing data in order to assist investigators in analysing and identifying causality among these occurrences. The existing CSA-net approach lacks the ability to automatically extract information from (un- structured) written sources and reports. Figure 2 illustrates the outcome of extracting information and representing it by three expert SON users. The experiment was focused on a short fragment of a crime story displayed in Figure 1. The users were asked to extract and represent crime events as a SON model. In addition, we were interested in observing the style of human extraction and modelling processes in order to determine the consistency of the models and the amount of time spent. Figure 1: A short fragment of a crime story The users extracted the following verbs from the sentences: play, lost, leaves, wearing, goes/returned, and shoot. Nevertheless, not all users agreed on the exact model design and in the terms of wording. For example, Modeller1 extracted only three verbs (play, lost, and shot), whereas Modeller3 extracted five verbs noting the insertion of verbs that were not explicitly mentioned in the sentences. Modeller3 added the words (leaves and goes) that may be explained by the human capacity to comprehend and express events differently. Despite minor representational discrepancies (for example, the extent of information provided by different modellers), the experiment revealed semantically similar models. In comparison to other modellers, Modeller1 extracted just enough data. Modeller1 extracted and presented the offense in a very straightforward manner by extracting two entities and three verbs. Modeller2, on the other hand, added an additional entity, ON : DICE, that the other two modellers did not, 192 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 Figure 2: Human modelling done by son expert users instead inserting DICE into the play event, PLAYED _ DICE and PLAY _ DICE, by Modeller1 and Modeller3, respectively. 4.2. Automatic extraction: Text2son With the above in mind, our goal is to develop a tool to extract crime events and relationships between crime events, and build a model in SONs for behaviour analysis and visualisation. We have applied three methods of extraction to identify and evaluate the most accurate method 193 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 compared to human extraction presented in the previous section. Initially, we only considered extracting the main verb that SPAC Y’s parser nominates as the main verb of a sentence, tagged as ROOT. ROOT tags appear once in every sentence representing the main word carrying the meaning of the sentence is (usually a verb). We then considered extracting more information, by evaluating the most frequently occurring verbs in the reserved data set (note that from the evaluation of around 570 crime stories we compiled a list of most frequently occurring verbs). We then tested the second method by extracting both ROOT verbs and verbs that match common verbs. Finally, we considered extracting all other verbs present in the text, including verbs tagged as ROOT. 4.2.1. Terminology Tokenization: the process of splitting words in a sentence into a series of tokens. Part-of-Speech (POS): assigns part of speech tags. Dependency Parsing: links tokens (words) as they grammatically appear in the sentence and assigns them parsing tags that shows their relationships, i.e., subject, object, conjunction, etc. Named Entity Recognizer (NER): is a model where entities are identified within a text and tagged with name types. Coreferencing: resolve pronouns and mentions to the original names they refer to. ROOT verbs: main verbs appear in sentences and are predicated by the PARSER as the primary word from which the sentence is parsed. Occurrence Net (ON): an acyclic net which provides a full and unambiguous record of all causal dependencies between the events it involves. 4.2.2. Main assumptions and preliminary rules In order to extract data automatically, we propose to extract entities PEOPLE and verbs. In our methodology, we consider entities as representations for ONs, and verbs as representations of EVENT s within the ON s. We will also consider that the shared EVENT s between different ON s represent potential synchronised communications, and we connect them formally using channel places. A number of assumptions has been put in place in order to carry out the extraction. We first assume that verbs tagged as ROOT verbs represent events (and each occurs exactly once). Then, since ROOT verb appears only once in a sentence, we assume that events are represented by either verbs tagged as ROOT verbs, or are verbs in most frequently occurring verbs list. This is due to the possibility of more than one event occurring in the same sentence. 4.2.3. The Extractor The proposed T EXT 2 SON extractor utilises the statistical models provided by SPAC Y. The algorithm in Figure 3 extracts people names present in a text, by searching for names in every sentence. Following this phase, we analyse every sentence by searching for presence of people 194 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 Figure 3: Extractor design names in each sentence and the occurring verbs labelled as ROOT. Once found, they are grouped together in a list. This process is repeated until the text reaches its end, resulting in lists of people names and their associated events (verbs). This is required because we regard names to be the representations of occurrence nets (i.e., each name is associated with exactly one ON), and verbs to be the representations of events. Figure 4(a) demonstrates the tools used to create the T EXT 2 SON extractor and their respective versions. We utilised SPAC Y, a Python library that makes use of pipeline packages with key 195 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 (a) (b) Figure 4: Tools for extractor design (a); and extractor design (b) linguistic features such as a TAGGER , PARSER , and NER. Figure 4(b) illustrates the steps involved in the extraction process. To begin with, text data is fed into SPAC Y’s pipeline, where it is tokenised into individual tokens. The tokens are then passed to SPAC Y’s tagger, which assigns anticipated POS tags, depending on the pre-trained model predictions. The parser next assigns tags indicating the relationships between the tokens. The NER assigns labels to identified entities, which may be individuals, organizations, or dates, to mention a few. However, for this part of the research, we are interested only in the PEOPLE tag. Additional tags will be included in future versions of the tool. Among the difficulties is resolving the pronouns to the correct previously named individual. To address this issue, we integrated N EURAL C OREF [12] , a SPAC Y compatible neural network model capable of annotating and resolving coreferences. Figure 5 shows how crime example sentences from Figure 1 are modified after applying N EURAL C OREF. More precisely, we can observe the replacement of the pronoun HE with the person’s name ROSS. Figure 5: The example text after applying coreference resolution using Neuralcoref 4.2.4. Formalisation of the construction In this section we provide a formal description of the steps undertaken by the proposed extraction method and construction procedure, for the first of the proposed event extraction methods (the 196 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 remaining two are similar). We assume that the NLP stage generated, from a given text of k sentences (written in a natural language), an extracted sequence: ExtractedText = ExtractedS1 ExtractedS2 . . . ExtractedSk such that each ExtractedSi is a pair (Entitiesi , eventi ), where Entitiesi are the entities associated with the i-th sentence, and eventi is the root verb of i-th sentence. Moreover, let Entities = Entities1 ∪ · · · ∪ Entitiesk = {ent1 , . . . , entn } be the set of all the entities. Then, for every entity ent, let events(ent) be the sequence of events events(ent) = x1 . . . xk where xi = eventi if ent belongs to Entitiesi , and otherwise xi is the empty string. Intuitively, events(ent) is the ordered sequence of events in which entity ent ‘participated’, and such a sequence can be used to provide a time-line for this entity. Following this observation, for each entity ent with events(ent) = ev1 . . . evl , we construct an occurrence net ONent = (P, T, F), where: P = {p(ent,evi ) | i = 1, . . . , k} ∪ {pent } T = t(ent,ev1 ) , . . . ,t(ent,evk ) F = {(p(ent,evi ) ,t(ent,evi ) ) | i = 1, . . . , l}∪ {(t(ent,evi ) , p(ent,evi+1 ) ) | i = 1, . . . , l − 1} ∪ {(t(ent,evl ) , pent )}. Finally, for each pair (t,t ′ ) = (t(ent,ev) ,t(ent ′ ,ev′ ) ) of transitions in T , where ent is different from ent ′ , we add channel places q = q(t,t ′ ) and q′ = q(t ′ ,t) together with the arcs (t, q) (q,t ′ ) (t ′ , q′ ) (q′ ,t) to enforce synchronisation between t and t ′ . One can then show that the result is a CSO-net which can be used for analysis and visualisation. 5. Discussion In order to evaluate our modelling approach, we used human modelling and compared it with the proposed extractor. We have conducted manual extraction and modelling experiments with expert SON users (researchers). They all produced similar outcomes in terms of model explaining the case, but (not surprisingly) in various forms. These models are not fundamentally dissimilar in terms of meaning, but rather in terms of the amount of information displayed. Figure 2 illustrates the human expert modelling for the example sentences in Figure 1. All of SON models shown here convey the same narrative because all of the modellers reported or modelled the semantics (meaning) of the sentences in the example sentences. However, different modellers incorporated varying amounts of information (EVENTs and ONs) in their models based on their judgments. This, however, may indicate a lack of modelling consistency due to the volume of data presented in the experiment. Another issue is the amount of time 197 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 required for such modelling. To construct a model, spending time on reading, comprehending to extract the crime events, and then modelling is inevitable. To address these issues, we employed three extraction methods and compared them to the human extraction described in Section 4.1. At first, we considered only the main verb that SPAC Y ’s parser indicates as the main verb of the sentence tagged with ROOT . ROOT tags appear once in every sentence representing the main word carrying the meaning of the sentence which is usually a verb. Then we considered extracting ROOT verbs alongside a list of common verbs used in criminal reporting. This list was compiled after analysing approximately 570 crime stories from The Violence Policy Center1 website for the most frequently occurring verbs. Finally, we considered extracting ROOT verbs as well as all other verbs in the text. Figure 6: Extractor automated extraction, manually modelled for visualisation purposes 1 https://vpc.org/ 198 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 T EXT 2 SON showed a significantly faster extraction process compared to human extraction. In our observations, we estimated that the time spent for manual extraction and modelling by the three expert users was on average 8 minutes. However, the automatic extractor handled the extraction process in about 6 seconds. But this was only the extraction phase time as we are working on the development of an automatic modeller that will function in conjunction with the extractor. The extractor demonstrated lower accuracy by associating verbs with unrelated entities. The assumption is that if the verb is a ROOT verb and appears in a sentence with other entities, we can reasonably presume that they are linked. However, because each phrase can have only one ROOT verb, which may or may not contain the intended crime verb, a modeller may infrequently choose another verb for extraction. The automatic modeller does not recognize linking of such verbs that relate to a single distinct entity unless that entity is the only entity contained within a sentence. However, when we fed the extractor a list of common crime verbs, it performed notably better in terms of extracting events. Yet, it maintains a connection between the newly extracted verbs and all other entities in the sentence. This is not necessarily accurate, as previously discussed, because a NON - ROOT verb can refer to a single entity. This leads to the establishment of an inaccurate communication link between the two entities. Another challenge is the amount of information displayed in visualised models in comparison to human modelling. Human modellers frequently augment the information offered in events with additional words. Comparing Figures 2 and 6, we noticed that people tend to add more words to events, such as PLAYED DICE, LOST GAME, LEAVE UPSET, and GOES TO S PICER ’ S events. This addition provides further description for the model, which aids visualisation. On the other hand, the automatic extractor extracts only one token for each identified event. We are currently investigating and enhancing the extractor by including information that a human modeller would incorporate. To facilitate comparison, the following table lists all the verbs extracted or selected for mod- elling by the various modellers. We can see that nearly all extraction methods agree on the verbs PLAYED , LOST, and SHOT , which may express the essential shooting events. Extracting only the ROOT produced satisfactory results except for the omission of the shooting incident, which we think to be noteworthy. As previously stated, some sentences may contain verbs other than the ROOT verb. This is one of the shortcomings of the proposed approach, which prompted us to experiment with two additional methods: extracting common verbs, and extracting all verbs. In comparison with the other two approaches, we see that extracting common criminal verbs alongside the root verb (second approach) produced more steady and acceptable result. Modeller1 Modeller2 Modeller3 ROOT ROOT and ROOT and common all verbs verbs played play play played played played lost lost leaves lost lost lost shoot(shot) return wearing returned according according - shoot(kill) goes - returned wearing - - shoots - shot returned - - - - - shot 199 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 This provides an opportunity for further enhancement and development of the tool. We are now working on improving the model by incorporating word relationships such as subject and object via the use of the dependency parser. Additionally, we are creating and training the NER model, as well as introducing new labels for criminal extractions for SON modelling. 6. Conclusion and future work We discussed the initial steps toward automatic SON data extraction using NLP prediction tech- niques. We used SPAC Y and included several of its models without modification. Specifically, the TOKENIZER , POS , PARSER, and NER. Additionally, we used N EURALCOREF to resolve mentions. We developed our algorithm to extract people’s names and events associated with them. Then we illustrated how the algorithm works by feeding T EXT 2 SON a text passage to extract events automatically in three different approaches. We then used expert SON users to extract and model the entities and events to verify and validate the result obtained from the tool. We compared human extraction to the final output produced by our automatic modeller. The ongoing work focuses on improving automatic extraction and developing an automatic modeller, as well as integrating both with SONs. Among the ongoing and future works are the following: • Developing and integrating an automatic modeller and examining it using a larger data set. • Utilising the dependency parser in SPAC Y to extract events associated with the extracted entities. In our approach, we leveraged the sentence’s main verb, ROOT, to express events regardless of their relationship to other entities in the sentence. • Investigating the effect of various human extraction behaviours on SON modelling. We will look for commonalities in human extraction behaviour in order to assess a broader understanding of human extraction for the purpose of SON modelling. • Developing a new NER model by training the model on a new set of data and introducing new distinct NER labels suited for crime extraction. References [1] M. Koutny, B. Randell, Structured occurrence nets: A formalism for aiding system failure prevention and analysis techniques, Fundam. Informaticae 97 (2009) 41–91. [2] B. Randell, Occurrence nets then and now: The path to structured occurrence nets, in: L. M. Kristensen, L. Petrucci (Eds.), Applications and Theory of Petri Nets - 32nd International Conference, PETRI NETS 2011, Newcastle, UK, June 20-24, 2011. Proceedings, volume 6709 of Lecture Notes in Computer Science, Springer, 2011, pp. 1–16. [3] T. Alharbi, M. Koutny, Domain name system (DNS) tunneling detection using structured occurrence nets (sons), in: D. Moldt, E. Kindler, M. Wimmer (Eds.), Proceedings of the International Workshop on Petri Nets and Software Engineering (PNSE 2019), volume 2424 of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 93–108. 200 �Tuwailaa Alshammari CEUR Workshop Proceedings 188–201 [4] B. Li, M. Koutny, Unfolding CSPT-nets, in: D. Moldt, H. Rölke, H. Störrle (Eds.), Pro- ceedings of the International Workshop on Petri Nets and Software Engineering (PNSE’15), volume 1372 of CEUR Workshop Proceedings, CEUR-WS.org, 2015, pp. 207–226. [5] P. M. Nadkarni, L. Ohno-Machado, W. W. Chapman, Natural language processing: an introduction, J. Am. Medical Informatics Assoc. 18 (2011) 544–551. [6] spaCy, https://spacy.io, 2022. [7] B. Li, Visualisation and Analysis of Complex Behaviours using Structured Occurrence Nets, Ph.D. thesis, School of Computing, Newcastle University, 2017. [8] T. Freytag, P. Allgaier, Woped goes NLP: conversion between workflow nets and natural language, in: W. M. P. van der Aalst et. al (Ed.), Proceedings of the Dissertation Award, Demonstration, and Industrial Track at BPM 2018, volume 2196 of CEUR Workshop Proceedings, CEUR-WS.org, 2018, pp. 101–105. [9] P. Siriaraya, Y. Zhang, Y. Wang, Y. Kawai, M. Mittal, P. Jeszenszky, A. Jatowt, Witnessing crime through tweets: A crime investigation tool based on social media, in: F. B. Kashani, G. Trajcevski, R. H. Güting, L. Kulik, S. D. Newsam (Eds.), Proceedings of the 27th ACM International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2019, Chicago, IL, USA, November 5-8, 2019, ACM, 2019, pp. 568–571. [10] S. Chakravorty, S. Daripa, U. Saha, S. Bose, S. Goswami, S. Mitra, Data mining tech- niques for analyzing murder related structured and unstructured data, American Journal of Advanced Computing 2 (2015) 47–54. [11] G. Carnaz, V. B. Nogueira, M. Antunes, Knowledge representation of crime-related events: a preliminary approach, in: R. Rodrigues, J. Janousek, L. Ferreira, L. Coheur, F. Batista, H. G. Oliveira (Eds.), 8th Symposium on Languages, Applications and Technologies, SLATE 2019, June 27-28, 2019, Coimbra, Portugal, volume 74 of OASIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019, pp. 13:1–13:8. [12] NeuralCoref, Neuralcoref 4.0: Fast coreference resolution in spacy with neural networks, https://github.com/huggingface/neuralcoref, 2022. 201 �