1 of 15

Hello

Folks!!

HTL-

Hack The League

2 of 15

DEBUGGERS

TEAM

3 of 15

Problem statement (Open innovation):

Malware is intrusive software that is designed to damage and destroy computers and computer systems. These malicious software may destroy important data or remove our access from it. To prevent this anti-malware software were created. Anti-malware is a computer program used to prevent, detect, and remove malware. Antivirus software was originally developed to detect and remove computer viruses, These antimalware software’s help in the detection and thereby prevention of attacks on systems. The security and integrity of one personal computer or an organization’s systems are as important as their performance. Blocking attacks on the system by malware files, viruses and other such practices is of main concern. The aim of this problem statement is to provide an ML-based approach to increase the security of a system against such attacks by detecting these malicious software before any damage is caused.

4 of 15

Our Approach – We have used supervised learning techniques to tackle this problem.

   Data Preparation -

       We uncompressed all the files which were provided to us after renaming them by their class-name.

        Now we cleaned the data by cleaner.py.

        Now the features are extracted from the files and saved in a CSV file.

        We added a last column  in this CSV named type which contains the class-name of the

file.This became our target variable.

        Then we cleaned the dataset by filling in missing values and other things.

   Model Training -

       We have used Random Forest Classifier to train our model.

        We generated training and test data.

        After the model is trained, we saved the model .

   Model Testing -

        We also tested the model on the test data.

        Accuracy and F1 score of the model is calculated and can be viewed by un-commenting The training

function.

        The accuracy and F1 score are printed in the console.

   Generating the result -

       Trained model is loaded, we use the data from perfect.csv to predict the class-name of the files.

        File names with their respective predicted class-name is saved in result.csv.

5 of 15

CODE SNIPPETS

It will clean the ELF file which is our sample data

Features will be stored in dictionary by the function get_elf_info(elf)

6 of 15

By function dict_to_csv(d,name) Features that are stored in dictionary will be saved as CSV

This will be used for Data and Target selection which are further stored in variable X and Y respectively

7 of 15

Since all the raw data files are not in form that we need(Some were corrupted /blank ) we need to clean the data

8 of 15

Raw Data

Cleaned data

9 of 15

This we visualize the data count in form of graph

Output generated by above code

10 of 15

Since all features are not important we extracted the top features

Output generated by the code

11 of 15

Splitting of training and testing database

For model train and testing Data

Model performance Statistics

Saving the trained Model

12 of 15

Predict_and_save()

Will predict result on the basis of trained model

Driver Code : Store the info of every file of directory

13 of 15

Conclusion:

Our Model successfully processes the malware given as a dataset and, we can classify different types of malware and take further steps to prevent them.

14 of 15

TEAM NAME

MEMBER’S NAME

EDUCATIONAL INSTITUTION

DEBUGGERS

    • AVIRAL SRIVASTAVA
    • KARTIK MEHTA
    • PRERNA CHOUDHARY

KIET GROUP OF INSTITUTIONS , GHAZIABAD

15 of 15

THANK

YOU