Author: Shalabi, Eman Selem Mahmoud Ahmad./ Title: LEARNING – BASED MALICIOUS BEHAVIOR DETECTION FOR WIRELESS NETWORK APPLICATIONS /

Search In this Thesis

العنوان

LEARNING – BASED MALICIOUS BEHAVIOR DETECTION FOR WIRELESS NETWORK APPLICATIONS /

المؤلف

Shalabi, Eman Selem Mahmoud Ahmad.

هيئة الاعداد

باحث / Eman Selem Mahmoud Ahmad Shalabi

مشرف / Walid Ibrahim Khedr

مشرف / Ahmad Salah

الموضوع

Information Technology.

تاريخ النشر

2020.

عدد الصفحات

122 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Science Applications

تاريخ الإجازة

1/1/2020

مكان الإجازة

جامعة الزقازيق - كلية الحاسبات والمعلومات - Information Technology

الفهرس

Only 14 pages are availabe for public view

from

119

from

119

Abstract

Smartphones and mobile tablets have become a significant part of our daily life. They have a vital role in our daily life. They have been widely used in many purposes, e.g., web browsing, online banking, entertainment, online learning, social networking, advertising, etc. They also have led to an increase in the number of users of this technology. The rising number of mobile devices end users invites hackers to generate malware so mobile devices are becoming vulnerable to malware. A mobile Malware could be any code which is added, changed or removed from any application to harm or damage the intended system function. There are many different malware types such as Adware, Bot, Bug, Spyware, Virus, Trojans, Worm, Rootkit and Ransomware.
Most mobile malware is designed to disable or harm a mobile device which allows a malicious user to remotely control the mobile device or to steal personal information stored on the mobile device. Mobile malware has threatened smartphones for many years. Among all platforms, Android is one of the most popular platforms today and it is gaining popularity with time so most of the discovered malwares are designed at Android platform. In contrast to other platforms, Android allows installing applications from third-party markets and unverified sources which facilitates the process of bundling and distributing malware applications for attackers. Due to the very high growth in the use of Android smartphones and the openness of Android platform, Android smartphones are increasingly targeted by attackers and infected with malicious software. Thus, there is a need for stopping the proliferation of malware on Android markets and smartphones.
Machine Learning plays an important role in the detection of malware. It has been used by researchers to improve detection accuracy. In this study, we focus on static analysis for the detection of Android malware. There are several approaches for static analysis including feature based approach, graph based approach and structure based approach. In this research we will focus on static analysis feature based approach. Static Analysis detects malware based on static features extracted from Android APK file. These features include Hardware features, Permissions, Application Components (Android activities, content provider, services and broadcast receivers), Intent filters, API calls and Network addresses. Among all Android APK contents, there are two important components for static analysis and for the detection of Android malware. These two essential APK components are Android Manifest.xml and “classes.dex”. Android Manifest.xml describes permissions, package name, version, referenced libraries, and application components (activities, services, content providers, and broadcast receivers) while “classes.dex” contains all Android classes compiled into a Dalvik compatible, dex file format.
In this research, we propose a lightweight machine learning-based model for the detection of Android malware by reducing the number of Drebin dataset features. The proposed model selects the most important Drebin features affecting the Android malware detection accuracy. The ultimate goal of this research is to find out the symmetric features across the malware Android application to easily detect them. Many state-of-the-art methods focus on extracting asymmetric patterns of the category of features, e.g., application permissions to distinguish the malware application from the benign application. In this work, we propose a compromise by considering different types of static features and select the most important features that affect the detection process. These features represent the symmetric pattern to be used for the classification task.
To construct the feature vector, we extract six categories of Android application features from the Android manifest file and source code. These feature types are Android permissions, used permissions application components, intent filters, and APIs (restricted and suspicious). We ignored features belonging to the activity category because many types of malware applications repack benign applications. Thus, the numbers of activity features considered malware and benign are almost the same. Moreover, we propose a new method for merging the Android application URLs into a single feature called the URL_score. This feature reflects the degree of maliciousness of all the URLs found in one application. We finally added URL_SCORE feature to our feature vector.
Five different machine learning linear classifiers have been evaluated. These classifiers are Linear Support Vector Machine, Logistic Regression, AdaBoost, Stochastic Gradient Decent and Linear Discriminant Analysis. The proposed method significantly reduced the size of the Drebin dataset feature vector and the memory size of the final model. In addition, the proposed model achieved the highest reported accuracy on the Drebin dataset to the date. Based on the evaluation results, Linear Support Vector Machine produced the highest accuracy which is 99.03%. Logistic Regression produced an accuracy of 98.93%. AdaBoost produced an accuracy of 97.98%. Stochastic Gradient Descent (SGD) produced an accuracy of 98.90%. Linear Discriminant Analysis (LDA) produced an accuracy of 98.83%.
Keywords: Android Malware, Android Malware detection, Drebin Dataset, Feature selection and Extraction, Mobile Malware, Machine Learning, AdaBoost, Logistic Regression, Linear Discriminant, Support Vector Machine, Stochastic Gradient Descent, Static Analysis, TF-IDF.