HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

HYDRA: A multimodal deep learning framework for malware classification

{Jordi Planes Carles Mateu Daniel Gibert}

Abstract

While traditional machine learning methods for malware detection largely depend on hand-designed features, which are based on experts’ knowledge of the domain, end-to-end learning approaches take the raw executable as input, and try to learn a set of descriptive features from it. Although the latter might behave badly in problems where there are not many data available or where the dataset is imbalanced. In this paper we present HYDRA, a novel framework to address the task of malware detection and classification by combining various types of features to discover the relationships between distinct modalities. Our approach learns from various sources to maximize the benefits of multiple feature types to reflect the characteristics of malware executables. We propose a baseline system that consists of both hand-engineered and end-to-end components to combine the benefits of feature engineering and deep learning so that malware characteristics are effectively represented. An extensive analysis of state-of-the-art methods on the Microsoft Malware Classification Challenge benchmark shows that the proposed solution achieves comparable results to gradient boosting methods in the literature and higher yield in comparison with deep learning approaches.

Benchmarks

BenchmarkMethodologyMetrics
malware-classification-on-microsoft-malwareAhmadi et al. (2016): API feature vector + XGBoost
Accuracy (10-fold): 0.9868
Macro F1 (10-fold): 0.9638
malware-classification-on-microsoft-malwareScaled bytes sequence + CNN & Bidirectional LSTM
Accuracy (10-fold): 0.9814
Macro F1 (10-fold): 0.9662
malware-classification-on-microsoft-malwareZero Rule Classifier
Accuracy (10-fold): 0.2707
malware-classification-on-microsoft-malwareRandom Guess Classifier
Accuracy (10-fold): 0.1755
malware-classification-on-microsoft-malwareNarayanan et al. (2016): PCA features + 1-NN
Accuracy (10-fold): 0.9660
Macro F1 (10-fold): 0.9102
malware-classification-on-microsoft-malwareZhang et al. (2016): Total lines of each Section, Operation Code Count, API Usage, Special Symbols Count, Asm File Pixel Intensity Feature, Bytes File Block Size Distribution, Bytes File N-Gram + Ensemble Learning (XGBoost)
Accuracy (10-fold): 0.9974
Macro F1 (10-fold): 0.9938
malware-classification-on-microsoft-malwareAhmadi et al. (2016): ENT, Bytes 1-G, STR, IMG1, IMG2, MD1, MISC, OPC, SEC, REG, DP, API, SYM, MD2 IMG and Opcode N-Grams + Ensemble Learning (XGBoost)
Accuracy (10-fold): 0.9976
Macro F1 (10-fold): 0.9931
malware-classification-on-microsoft-malwareHYDRA
Accuracy (10-fold): 0.9975
Macro F1 (10-fold): 0.9951

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
HYDRA: A multimodal deep learning framework for malware classification | Papers | HyperAI