Automated Extraction of Structured Data from the Social Network Instagram
DOI:
https://doi.org/10.34190/eccws.23.1.2160Abstract
The paper explores the extraction of structured information from the social network Instagram through a suitable application programming interface, namely the unofficial Instagram Private API. It focuses on creating a computer program that identifies which posts a user has tagged as "Likes" and then stores this information for profiling specific user profiles. The introduction of the paper highlights the general use of social media in modern society and the importance of personal data for these platforms. It specifies the aim of the study, which is to extract information from Instagram and then analyse it for user profiling. It then describes the evolution of the social network Instagram and key features such as different types of posts. This paper further focuses on the solution and implementation by using Python programming language to minimize the load on Instagram servers and reduce the risk of detection of automated processes. It describes the process of setting up new Instagram accounts, the obstacles in obtaining login credentials, and the need to simulate human behaviour to bypass the network's defence mechanisms. It then focuses on the actual retrieval of information such as the users followed, their posts and information about which posts the user has marked as favourites. It mentions that extracting data from closed profiles is difficult and elaborates on the technical challenges associated with this task. A significant part of this paper is a discussion of Instagram's defence mechanisms that respond to automated computer programs. It describes access denial, account blocking, and identity verification prompts such as CAPTCHA tests. Finally, the conclusion summarizes the results obtained, which indicate the acquisition of approximately 90,000 records for user profiling. It discusses the shortcomings of a fully automated solution due to Instagram's account creation conditions and defence mechanisms. It mentions the need for further research and highlights key gaps and challenges in this area. Overall, the study highlights the technical and security challenges in extracting information from Instagram and emphasises the need for further research and improvements in the technical procedures for extracting data from the platform.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 European Conference on Cyber Warfare and Security
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.