Malware Detection using ML

Cyber Security is the major concern for industries today & it’s continuously growing in numbers. Enterprises see AI/ML based solutions has the true potential to address cyber threats in much more efficient ways. Machine Learning, Deep Learning based solutions expect labelled datasets, extensive datasets in order to flag Malwares. Although today advanced Deep learning solutions are used more often than traditional rule based or Machine leaning approach, but we start with a machine learning approach first to detect malware samples. We’ll try same problem again with Deep learning later.

Problem statement:
To predict malwares from a dataset containing legitimate and malware samples.
Data Set:
The dataset contains both legit & malware samples (.exe/.dll).
File Parse:
Once you read the data set, it comprises of 138047 rows and 57 features:

Column Names:

Target variable is “legitimate”, let’s look at the distribution

Data preparation is performed using Scikit learn
Feature selection not being done as we’ve considered all features here, but we could use SelectKbest
Classifier used: Decision Tree, Random forest


Random Forest Classification::

Decision Tree Classification::


Both Random Forest and Decision Tree works fine here, although we can tune these models further. Will see how Deep learning works in our next assignment.

Leave a Reply

Your email address will not be published. Required fields are marked *