Author: Ismail, Raya Nazar./ Title: Improving classification accuracy based on feature selection /

Search In this Thesis

العنوان

Improving classification accuracy based on feature selection /

المؤلف

Ismail, Raya Nazar.

هيئة الاعداد

باحث / ريا نزار إسماعيل

مشرف / أحمد أبوالفتوح صالح

مشرف / شريهان محمد أبوالعنين

مناقش / أميمة نمير

الموضوع

Data mining - Data processing. Soft computing. Database searching - Data processing.

تاريخ النشر

2016.

عدد الصفحات

p 81. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

01/01/2016

مكان الإجازة

جامعة المنصورة - كلية الحاسبات والمعلومات - Information Department of Computer Science

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Feature selection is the process of finding a small subset of features from the original set that can maximize the classification accuracy. Many other benefits for the feature selection process like, it reduces time needed to invoke the learner for each feature in the original set, reduces the computational complexity of the algorithms, and helps to understand the data and its properties. Most of recent studies in the field of combining resampling methods with feature selection models used the hybrid approach in feature selection. The proposed method adopted the wrapper approach without using ranker. Since wrappers are more specific in finding relevant features for any classifier, and combine it with filter on instances to reduce the search space in each stage. This thesis presents a three phase method for feature selection. Firstly refined sample space domain using resample filtering on instances. Secondly it minimized the feature space by applying subset evaluation algorithm. Thirdly it measured the goodness of the resulting set of features using different classifiers. In this method, the sample domain filtering with random sampling (with replacement) is applied to remove some instances that could decrease the classifier accuracy. Then a wrapper approach is used to eliminate the irrelevant features by employing the Sequential Forward Floating selection as a search method and classifiers accuracy as an evaluator. In this case, two dimensions of the dataset will be processed to gain significant performance with much less features than the original feature space. Two experiments are carried out on data sets from UCI repository [45]. The results based on Naïve Bayes and its variance NBTREE, NBNET and J48 with other tree classifiers Random Forest, BFTREE. The proposed method is evaluated by measuring accuracy, error rate, number of features, relative absolute errors, precision, recall, F-measure, ROC area, and time to build model in seconds. The first experiment consists of different purposes datasets while the second contains large, small and medical datasets. In both experiments, tests are done by using two main types of classifiers Naïve Bayes and trees, after applying the three stages method results showed decrement in the feature space in most of cases and improvement in prediction accuracy with minimized error rates. In the second part of tests a detailed impact of the resulted features on the performance of six classifiers is illustrated for each dataset. The empirical results of the two experiments showed that the proposed method produced new subset of features in which the learner performed better in the terms of accuracy, recall, F-measure, and other measurement.