Author: Mohamed, Amr Hassan abedelhalim./ Title: Data Mining for Knowledge Discovery /

Search In this Thesis

العنوان

Data Mining for Knowledge Discovery /

المؤلف

Mohamed, Amr Hassan abedelhalim.

هيئة الاعداد

باحث / عمرو حسن عبد الحليم محمد

مشرف / .د: أحمد صوفى ابو طالب

مشرف / أ.د: أحمد احمد محمد حسن

مناقش / .د: أحمد صوفى ابو طالب

الموضوع

Data Mining. Data mining - Computer programs.

تاريخ النشر

2013.

عدد الصفحات

89 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الرياضيات

تاريخ الإجازة

1/1/2013

مكان الإجازة

جامعة الزقازيق - كلية العلوم - الرياضيات

الفهرس

Only 14 pages are availabe for public view

from

102

from

102

Abstract

With the widespread use of databases and the explosive growth in their sizes, there is a need to effectively utilize these massive volumes of data. This is where data mining comes in handy, as it scours the databases for extracting hidden patterns, finding hidden information, decision making and hypothesis testing. Feature selection is one of the important data preprocessing steps in data mining. The feature selection problem involves finding a feature subset from original features that can achieve maximum classification accuracy. This subset of features has some very important benefits like, it reduces computational complexity of learning algorithms, saves time and improve accuracy. This makes feature selection as an indispensable task in classification task because the model built only with this subset would have better predictive accuracy than model built with a complete set of features. Feature selection algorithms are usually classified in three general groups: Filters, Wrappers and Hybrid solutions. Thesis addresses the task of feature selection for classification. In this thesis, we propose a new two hybrid system for the problem of feature selection. The idea is to extract and combine the best characteristics of filters and wrappers in one system. In the first hybrid system approach we present two stage hybrid feature selection technique in the first stage we use ensemble filter method to ranking all features and select top feature. A wrapper method is used in the second stage uses results from first stage as a starting point through SVM-RFE (Recursive Feature Elimination). Then We evaluated our first hybrid system over four of microarrays dataset that consists of gene expression profiles for cancer disease; the Leukemia, the Lung , the Breast and ovarian cancer; using six classifiers, which are Naïve Bayes (NB), Random Forest (RF), Decision Trees (C4.5), Support Vector Machines (SVM), K-Nearest Neighbor (KNN) and Logistic Regression (LR) and show the potentiality of the proposed method with the advantage of improving the classification performance. In the second hybrid system approach we propose three-stages of gene selection algorithm for microarray data. where combines information gain (IG), Significance Analysis for Microarrays (SAM), mRMR (Minimum Redundancy Maximum Relevance) and Support Vector Machine Recursive Feature Elimination (SVM-RFE). In the first stage, intersection part of feature sets is identified by applying the (SAM–IG). While, the second minimizes the redundancy with the help of mRMR method, which facilitates the selection of effectual gene subset from intersection part that recommended from the first stage. In the third stage, (SVM-RFE) is applied to choose the most discriminating genes. Then we evaluated second hybrid system on AML and ALL (leukemia) dataset using Support Vector Machines (SVM) classifier, and show the potentiality of the proposed method with the advantage of improving the classification performance.