ISSN :2582-9793

Search and Retrieval in Semantic-Structural Representations of Novel Malware

Original Research (Published On: 20-Mar-2024 )
Search and Retrieval in Semantic-Structural Representations of Novel Malware
DOI : https://dx.doi.org/10.54364/AAIML.2024.41117

John Musgrave, Alina Campan, Temesguen Messay-Kebede and David Kapp

Adv. Artif. Intell. Mach. Learn., 4 (1):2052-2076

John Musgrave : University of Cincinnati

Alina Campan : Northern Kentucky University

Temesguen Messay-Kebede : Air Force Research Lab, Wright-Patterson Air Force Base

David Kapp : Air Force Research Lab, Wright-Patterson Air Force Base

Download PDF Here

DOI: https://dx.doi.org/10.54364/AAIML.2024.41117

Article History: Received on: 20-Jan-24, Accepted on: 23-Feb-24, Published on: 20-Mar-24

Corresponding Author: John Musgrave

Email: musgrajw@mail.uc.edu

Citation: John Musgrave, Alina Campan, Temesguen Messay-Kebede, David Kapp, Boyang Wang (2024). Search and Retrieval in Semantic-Structural Representations of Novel Malware. Adv. Artif. Intell. Mach. Learn., 4 (1 ):2052-2076


Abstract

    

In this study we present a novel representation for binary programs, which captures semantic similarity and structural properties.  Our representation is composed in a bottom-up approach and enables new methods of analysis.  We show that we can perform search and retrieval of binary executable programs based on similarity of behavioral properties, with an adjustable level of feature resolution.  We begin by extracting data dependency graphs (DDG), which are representative of both program structure and operational semantics.  We then encode each program as a set of graph hashes representing isomorphic uniqueness, a method we have labeled DDG Fingerprinting.  Next, we use k-Nearest Neighbors to search in a metric space constructed from examples.  This approach allows us to perform a quantitative analysis of patterns of program operation. By evaluating similarity of behavior we are able to recognize patterns in novel malware with functionality not previously identified.  We present experimental results from search based on program semantics and structural properties in a dataset of binary executables with features extracted using our method of representation.  We show that the associated metric space allows an adjustable level of resolution. Resolution of the features may be decreased for breadth of search and retrieval, or as the search space is reduced, the resolution may be increased for accuracy and fine-grained analysis of malware behavior.

Statistics

   Article View: 133
   PDF Downloaded: 2