Automated Extraction of Structured Data from the Social Network Instagram

Authors

  • Petr Frantis Department of Informatics and Cyber Operations, Information Warfare, Learning Technology https://orcid.org/0000-0001-8369-1334
  • Michal Bures Student at the University of Defence https://orcid.org/0009-0008-3921-0052
  • Aneta Coufalikova Department of Informatics and Cyber Operations, Information Warfare, Learning Technology https://orcid.org/0000-0001-7037-3126
  • Ivo Klaban Department of Informatics and Cyber Operations, Information Warfare, Learning Technology

DOI:

https://doi.org/10.34190/eccws.23.1.2160

Abstract

The paper explores the extraction of structured information from the social network Instagram through a suitable application programming interface, namely the unofficial Instagram Private API. It focuses on creating a computer program that identifies which posts a user has tagged as "Likes" and then stores this information for profiling specific user profiles. The introduction of the paper highlights the general use of social media in modern society and the importance of personal data for these platforms. It specifies the aim of the study, which is to extract information from Instagram and then analyse it for user profiling. It then describes the evolution of the social network Instagram and key features such as different types of posts. This paper further focuses on the solution and implementation by using Python programming language to minimize the load on Instagram servers and reduce the risk of detection of automated processes. It describes the process of setting up new Instagram accounts, the obstacles in obtaining login credentials, and the need to simulate human behaviour to bypass the network's defence mechanisms. It then focuses on the actual retrieval of information such as the users followed, their posts and information about which posts the user has marked as favourites. It mentions that extracting data from closed profiles is difficult and elaborates on the technical challenges associated with this task. A significant part of this paper is a discussion of Instagram's defence mechanisms that respond to automated computer programs. It describes access denial, account blocking, and identity verification prompts such as CAPTCHA tests. Finally, the conclusion summarizes the results obtained, which indicate the acquisition of approximately 90,000 records for user profiling. It discusses the shortcomings of a fully automated solution due to Instagram's account creation conditions and defence mechanisms. It mentions the need for further research and highlights key gaps and challenges in this area. Overall, the study highlights the technical and security challenges in extracting information from Instagram and emphasises the need for further research and improvements in the technical procedures for extracting data from the platform.

Author Biographies

Petr Frantis, Department of Informatics and Cyber Operations, Information Warfare, Learning Technology

Petr Frantis is an associated professor at the University of Defense in Brno, Czech Republic. He received his MSc in Computer Science from the Military Academy in Brno and his Ph.D. from the University of Defence. He currently works as the head of the Department of Informatics and Cyber Operations. His main research areas are simulations, synthetic environments and cyber security.

Michal Bures, Student at the University of Defence

Michal Bures is a student at the University of Defence in Brno, Czech Republic. He completed his Bachelor's degree studies at the Faculty of Informatics at Masaryk University, with a thesis aimed at creating an intelligent agent in a cybersecurity simulator. His current research focuses on image recognition on UAVs and quantum artificial intelligence.

Aneta Coufalikova, Department of Informatics and Cyber Operations, Information Warfare, Learning Technology

Aneta Coufalikova is an assistant professor at University of Defence, Brno, The Czech Republic. In 2004 she graduated from the Military Academy in Brno with a Master's degree in Special Communication Systems. PhD. since 2008. She worked for the military Computer Incident Response Capability Technical Centre from 2008 to 2020. Her area of interest is cyber security of information systems.

Ivo Klaban, Department of Informatics and Cyber Operations, Information Warfare, Learning Technology

Ivo Klaban worked for more than a dozen years in the Army's Computer Incident Response Capability technical center. His role was deputy director of the CIRC center and among his management tasks he also identified cyber security threats and incidents through continuous monitoring of data networks, he analyzed, evaluated and reported gained information about incidents to relevant partners. Since 1. 1. 2021 he is the Head of the Cyber Operations Group at the Department of Informatics and Cyber Operations back at his alma mater, the University of Defense.

Downloads

Published

2024-06-21