摘要
尽管传统的恶意软件检测机器学习方法主要依赖于人工设计的特征,这些特征基于领域专家的知识,而端到端学习方法则以原始可执行文件作为输入,试图从中自动学习一组描述性特征,但后者在数据量有限或数据集存在不平衡等问题时表现可能不佳。本文提出了一种名为HYDRA的新框架,通过融合多种类型的特征,旨在揭示不同模态之间的关联关系,以解决恶意软件检测与分类任务。该方法从多种数据源中学习,最大限度地发挥多种特征类型的优势,从而充分反映恶意软件可执行文件的内在特性。我们设计了一种基线系统,结合了人工特征工程与端到端学习组件,以兼顾特征工程与深度学习的优点,实现对恶意软件特征的有效表征。在微软恶意软件分类挑战赛(Microsoft Malware Classification Challenge)基准上的大量实验分析表明,所提出的方案在性能上可与文献中主流的梯度提升方法相媲美,同时在检测效果上显著优于现有的深度学习方法。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| malware-classification-on-microsoft-malware | Ahmadi et al. (2016): API feature vector + XGBoost | Accuracy (10-fold): 0.9868 Macro F1 (10-fold): 0.9638 |
| malware-classification-on-microsoft-malware | Scaled bytes sequence + CNN & Bidirectional LSTM | Accuracy (10-fold): 0.9814 Macro F1 (10-fold): 0.9662 |
| malware-classification-on-microsoft-malware | Zero Rule Classifier | Accuracy (10-fold): 0.2707 |
| malware-classification-on-microsoft-malware | Random Guess Classifier | Accuracy (10-fold): 0.1755 |
| malware-classification-on-microsoft-malware | Narayanan et al. (2016): PCA features + 1-NN | Accuracy (10-fold): 0.9660 Macro F1 (10-fold): 0.9102 |
| malware-classification-on-microsoft-malware | Zhang et al. (2016): Total lines of each Section, Operation Code Count, API Usage, Special Symbols Count, Asm File Pixel Intensity Feature, Bytes File Block Size Distribution, Bytes File N-Gram + Ensemble Learning (XGBoost) | Accuracy (10-fold): 0.9974 Macro F1 (10-fold): 0.9938 |
| malware-classification-on-microsoft-malware | Ahmadi et al. (2016): ENT, Bytes 1-G, STR, IMG1, IMG2, MD1, MISC, OPC, SEC, REG, DP, API, SYM, MD2 IMG and Opcode N-Grams + Ensemble Learning (XGBoost) | Accuracy (10-fold): 0.9976 Macro F1 (10-fold): 0.9931 |
| malware-classification-on-microsoft-malware | HYDRA | Accuracy (10-fold): 0.9975 Macro F1 (10-fold): 0.9951 |