Birds of a Feature: Intrafamily Clustering for Version Identification of Packed Malware

Leo Hyun Park, Jungbeen Yu, Hong Koo Kang, Taejin Lee, Taekyoung Kwon

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


It is challenging for malware lineage inference to identify versions of collected malware by ensuring high accuracy in clustering. In this article, we tackle this problem and present a novel mechanism using behavioral features for version identification of (un)packed malware. Our basic idea involves focusing on intrafamily clustering. We extract the so-called family feature sets, i.e., hybrid features specific to each family. Our intuition is that family feature sets may achieve higher accuracy in clustering than common feature sets, and unpacked malware found in or relevant to such a cluster can result in the lineage inference of family members using traditional inference methods. We conduct experiments with two datasets, 8928 malware samples from VXHeavens and 3293 samples by manual analysis, composed of packed malware in a large portion. The results demonstrate that we can accurately classify samples into malware families based on the hybrid features we choose. In addition, we can also effectively extract family feature sets from 37 feature categories using forward stepwise selection. For intrafamily clustering, we employed the agglomerative clustering algorithm and observed that using family feature sets is significantly more accurate than using common feature sets, which facilitates higher accuracy lineage inference of packed malware.

Original languageEnglish
Article number8951062
Pages (from-to)4545-4556
Number of pages12
JournalIEEE Systems Journal
Issue number3
Publication statusPublished - 2020 Sept

Bibliographical note

Publisher Copyright:
© 2007-2012 IEEE.

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Birds of a Feature: Intrafamily Clustering for Version Identification of Packed Malware'. Together they form a unique fingerprint.

Cite this