Hierarchical Density-based Clustering of Malware Behaviour

Johari Abdullah, Navein Chanderan


The numbers and diversity of malware variants grows exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples efficiently. To address this problem, we propose a framework for the automatic analysis of a given malware’s dynamic properties using clustering technique. The framework also provides outlier discovery, abnormal behaviour analysis and discrimination of malware variants. We also created a module for normalisation of malware labelling based on the labels we get from VirusTotal, which provides consistency of malware labels for accurate analysis of malware family and types. An evaluation model for the proposed framework is also discussed. Ultimately, the proposed framework will ensure rapid analysis of malware samples and lead to better protection for various parties against malicious malware.


Anomaly Detection; Automated Dynamic Malware Analysis; Clustering; Malware Behaviour;

Full Text:



“Internet Security Threat Report,” 2016. Available: https://www.symantec.com/content/dam/symantec/docs/ report/istr-21-2016-en.pdf

Av-test.org, “AV-TEST – The Independent IT Security Institute,” 2016. Available: http://www.av-test.org/en/statistics/malware

A. Moser, C. Kruegel, and E. Kirda, “Limits of static analysis for malware detection,” in Computer Security Applications Conference, 2007, ACSAC 2007, Twenty-third annual, IEEE, 2007, pp. 421-430.

K. Rieck, and P. Laskov, “Linear-time computation of similarity measures for sequential data,” Journal of Machine Learning Research 9, Jan 2008, pp. 23-48.

A. Moser, C. Kruegel, and E. Kirda, "Exploring multiple execution paths for malware analysis," in Proceedings of the 2007 IEEE Symposium on Security and Privacy, IEEE, 2007, pp. 231-245.

C. Willems, T. Holz, and F. Freiling, “CWSandbox: Towards automated dynamic binary analysis,” IEEE Security and Privacy 5, no. 2, 2007, pp. 32-39.

A. Dinaburg, P. Royal, M. Sharif and W. Lee, “Ether: malware analysis via hardware virtualization extensions,” in Proceedings of the 15th ACM conference on Computer and communications security, ACM, 2008, pp. 51-62.

C. Guarnieri, A. Tanasi, J. Bremer, and M. Schloesser, The Cuckoo Sandbox, 2012.

R. S. Pirscoveanu, M. Stevanovic and J. M. Pedersen, "Clustering analysis of malware behavior using Self Organizing Map," in 2016 International Conference On Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), London, 2016, pp. 1-6.

M. Bailey, J. Oberheide, J. Andersen and Z. M. Mao, “Automated classification and analysis of internet malware,” in International Workshop on Recent Advances in Intrusion Detection, Springer Berlin Heidelberg, 2007, pp. 178-197.

U. Bayer, P. M. Comparetti, C. Hlauschek and C Kruegel, “Scalable, Behavior-Based Malware Clustering,” in NDSS, vol. 9, Feb 2009, pp. 8-11.

K. Rieck, T. Holz, C. Willems, P. Düssel and P. Laskov, “Learning and classification of malware behaviour,” in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Springer Berlin Heidelberg, 2008, pp. 108-125.

H. S. Galal, Y. B. Mahdy and M. A. Atiea, “Behavior-based features model for malware detection,” in Journal of Computer Virology and Hacking Techniques 12, no. 2, 2016, pp. 59-67.

R. Perdisci, “VAMO: towards a fully automated malware clustering validity analysis,” in Proceedings of the 28th Annual Computer Security Applications Conference, ACM, 2012, pp. 329-338.

CARO - Computer Antivirus Research Organization, “A New Virus Naming Convention (1991)”. Available: http://www.caro.org/articles/ naming.html

R. J. G. B. Campello, D. Moulavi and J. Sander, “Density-based clustering based on hierarchical density estimates,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2013, pp. 160-172.

VirusTotal. Available: https://www.virustotal.com/

VirusShare.com, “VirusShare.com,” 2016. Available: http://virusshare.com

Dasmalwerk.eu, “DAS MALWERK,” 2016. Available: http://dasmalwerk.eu/

Contagiodump.blogspot.com, “contagion,” 2016. Available: http://contagiodump.blogspot.com/

M. Chandramohan, H. B. K. Tan and L. K. Shar, “Scalable malware clustering through coarse-grained behavior modeling,” in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, ACM, 2012, p. 27.

K. Rieck, “Malheur Dataset”. Available: https://www.sec.cs.tu-bs.de/ data/malheur/

K. Rieck, P. Trinius, C. Willems and T. Holz. “Automatic analysis of malware behavior using machine learning,” in Journal of Computer Security 19, no. 4, 2011, pp. 639-668.

C. Kolbitsch, E. Kirda, and C. Kruegel. “The power of procrastination: detection and mitigation of execution-stalling malicious code,” in Proceedings of the 18th ACM conference on Computer and communications security, ACM, 2011, pp. 285-296.

Pafish, “a0rtega/pafish: Pafish is a demonstration tool that employs several techniques to detect sandboxes and analysis environments in the same way as malware families do,” 2017. Available: https://github.com/a0rtega/pafish

A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” in ACM computing surveys (CSUR) 31, no. 3, 1999, pp. 264-323.

M. Ester, H. P. Kriegel, J. Sander, and X. W. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Kdd, vol. 96, no. 34, 1996, pp. 226-231.

N. Kawaguchi, and K. Omote, “Malware function classification using APIs in initial behavior.” in 2015 10th Asia Joint Conference on Information Security (AsiaJCIS), IEEE, 2015, pp. 138-144.

B. Kolosnjaji, A. Zarras, G. Webster, and C. Eckert, “Deep Learning for Classification of Malware System Call Sequences,” in Australasian Joint Conference on Artificial Intelligence, Springer International Publishing, 2016, pp. 137-149.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

ISSN: 2180-1843

eISSN: 2289-8131