Identification of Violence in Twitter Using a Custom Lexicon and NLP


  • Jonathan Adkins Norwich University



social network analysis, natural language processing, NLP, sentiment analysis, text mining


Information warfare is no longer a denizen purely of the political domain. It is a phenomenon that permeates other domains, especially those of mass communications and cybersecurity. Deepfakes, sock puppets, and microtargeted political advertising on social media are some examples of techniques that have been employed by threat actors to exert influence over consumers of mass media. Social Network Analysis (SNA) is an aggregation of tools and techniques used to research and analyze the nature of relationships between entities. SNA makes use of such tools as text mining, sentiment analysis, and machine learning algorithms to identify and measure aspects of human behavior in certain defined conditions. One area of interest in SNA is the ability to identify and measure levels of strong emotions in groups of people. In particular, we have developed a technique in which the potential for increased violence within a community can be identified and measured using a combination of text mining, sentiment analysis, and graph theory. We have compiled a custom lexicon of terms used commonly in discussions relating to acts of violence. Each term in the lexicon has a numerical weight associated with it, indicating how violent the term is. We will take samples of online community discussions from Twitter and use the R and Python programming languages to cross-reference the samples with our lexicon. The results will be displayed in a Twitter discussion graph where the user nodes are color-coded according to the overall level of violence that is inherent in the Tweet. This methodology will demonstrate which communities within an online social network discussion are more at risk for potentially violent behavior. We assert that when this approach is used in association with other NLP techniques such as word embeddings and sentiment analysis, it will provide cybersecurity and homeland security analysts with actionable threat intelligence.