Cyber Security is the major concern for industries today & it’s continuously growing in numbers. Enterprises see AI/ML based solutions has the true potential to address cyber threats in much more efficient ways. Machine Learning, Deep Learning based solutions expect labelled datasets, extensive datasets in order to flag Malwares. Although today advanced Deep learning solutions are used more often than traditional rule based or Machine leaning approach, but we start with a machine learning approach first to detect malware samples. We’ll try same problem again with Deep learning later.
The dataset contains both legit & malware samples (.exe/.dll).
Target variable is “legitimate”, let’s look at the distribution
Data preparation is performed using Scikit learn
Feature selection not being done as we’ve considered all features here, but we could use SelectKbest
Classifier used: Decision Tree, Random forest
Decision Tree Classification::
For detail code lets visit my GitHub link: