Permission-Based Classification of Android Malware Applications Using Random Forest




Malware Classification, Android, Permissions, Machine Learning, Random Forest


Android is arguably the most widely used mobile operating system in the world. Due to its widespread use, it has attracted a lot of attention of cybercriminals who attempt to exploit its architecture and outsmart innocent users to install malware applications. The number of such applications is growing every day either by alternating a basic exploitation mechanism or by creating novel mechanisms to exfiltrate users’ data. As a result, there is an increasing need for detection mechanisms that can classify these applications to families based on their characteristics. A significant amount of research has already been devoted to analysing and mitigating this growing problem; however, this situation demands more efficient methods with higher precision. The paper proposes such a framework for analysing and classifying a malicious application to certain families relying on the permissions used. The proposed method involves the pre-processing of the applications to extract their permissions, the tokenization of permissions, the data cleansing and finally the application of the Random Forest Classifier to classify the applications in families. The proposed method is trained and tested with a dataset of 11,159 malicious applications categorized in 33 unique families. The precision, recall and f1-score achieved is 98%. The results of the proposed methodology are promising, since it even works in an unbalanced dataset and in many cases outperform other state-of-the-art approaches.