Evaluating a Non-platform-specific OCR/NLP system to detect Online Grooming





Natural Language Processing, Machine Learning, Optical Character Recognition, Online Grooming


Online Grooming is a social engineering attack in which the attacking party uses deceptive practices for sexual gratification. The targets of these attacks can vary in demographics however in most cases the target is children, with most of these attacks occurring on social media platforms. As well as the illegality of these attacks in the UK and US, children who experience these attacks are at a higher risk of self-harm or having suicidal thoughts. Due to the deployment of new social media platforms/features any implementation that is made specific to a certain feature/platform is likely to be outdated/ineffective upon release, due to the volatility of the methods/tactics used. Therefore a non-platform specific implementation has been considered within this investigation. From a preliminary analysis, it was concluded that there was an average true positive detection rate of 71% from using optical recognition and natural language processing across three different social media platforms. It is suggested that implementing this text extraction and processing method alongside a 'category-based' machine learning algorithm, a solution that can identify online grooming can be developed that considers the 'real world complexities' of this attack.