Search In this Thesis
   Search In this Thesis  
العنوان
Enhanced Classification Framework for Imbalanced
Big Data
/
المؤلف
Shrouk El-Amir Mohamed, El-Shazly.
هيئة الاعداد
باحث / Shrouk El-Amir Mohamed El-Shazly
مشرف / Samir EldesokyElmougy
مشرف / Heba El-Fiqi
مشرف / Heba El-Fiqi
الموضوع
Data. Computer Science.
تاريخ النشر
2020
عدد الصفحات
69 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
علوم الحاسب الآلي
الناشر
تاريخ الإجازة
1/1/2020
مكان الإجازة
جامعة الزقازيق - كلية الحاسبات والمعلومات - Computer Science
الفهرس
Only 14 pages are availabe for public view

from 68

from 68

Abstract

With the recent wave of data analytics accessing every domain, there is a growing interest in handling an imbalanced classification problem. This problem occurs when the positive class size is smaller compared with the major class (negative class), as in the case of disease detection, cyber-attacks, and many Data Mining (DM) applications. Among the different algorithms that addressed this problem, Random Forest (RF) attracted many researchers because of its general robustness. But, Random sampling techniques and other cost-sensitive algorithms are suffering from low sensitivity and low precision according to positive class when dealing with imbalanced dataset problem.
In this thesis, we propose and develop an Entropy-based Fuzzy Random Forest (EFRF) algorithm adopted from EFSVM algorithmto deal with imbalanced classification problem. In EFRF, fuzzy membership is applied to the training instances such that different instances offer different contributions to the classifiers. Samples that have a higher class certainty are assigned to larger fuzzy memberships. EFRF uses the entropy to pay more attention to the samples with higher class certainty to result in more robust decision making to avoid losing information like other undersampling algorithms.
The proposed algorithm showed promising results compared to other imbalanced classification techniques including Entropy-based Fuzzy Support Vector Machine (EFSVM) technique. It featured both of high precision and recall which makes it is a suitable choice for security-wise applications.