Search In this Thesis
   Search In this Thesis  
العنوان
Developing An Efficient Classifier For Medical Data /
المؤلف
Alpukhaiti, Hisham Abdellatif.
هيئة الاعداد
باحث / هشام عبد اللطيف محمد البخيتى
مشرف / عادل ابو المجد سويسى
مناقش / ابراهيم محمد
مناقش / احمد شرف الدين
الموضوع
Medical Data.
تاريخ النشر
2011.
عدد الصفحات
109 P. ;
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Information Systems
الناشر
تاريخ الإجازة
28/12/2011
مكان الإجازة
جامعة أسيوط - كلية الحاسبات والمعلومات - نظم المعلومات
الفهرس
Only 14 pages are availabe for public view

from 127

from 127

Abstract

With the enormous amount of data stored in files, databases, and other repositories in all branches of scientific knowledge, medical data, demographic data, financial data, and marketing data. It is increasingly important to develop powerful means for analysis, interpretation and extraction of interesting knowledge that could help in decision-making. Data mining offers theories, techniques, and tools for processing large volumes of data. Medical data is important application for data mining process, because it has huge data, high-dimensional data, noise and data provenance. It aims to achieve optimal medical outcomes by helping physicians and patients in the context of a patient’s genetic and environmental profile data Medical and genetic.
In this thesis, a hybrid approach is presented in order to classify diseases medical and genetic, Breast cancer, Lung cancer and Arrhythmia as medical data, Leukemia, Breast cancer and Prostate as genetic data. High-dimensional data in medical and genetic data are involved in the computation it makes the implementation of model and/or pattern classifier quite difficult and sometimes impossible. This limitation is usually referred as the “curse of dimensionality”. Efforts are undergoing to reduce the complexity in an efficient manner and at the same time achieve sufficient level of classification accuracy. In genetic data the gene selection is problem in the classification of serious diseases in clinical information systems. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analysis. Our approach is called Clustering-ANOVA- Support Vector Machines (CASVM). In this thesis is used clustering (K-means Cluster) with statistical analysis (ANOVA Test) as a preprocessing step and a non-linear Support Vector Machines algorithm to classify diseases related to medical data. To compare the performance of the proposed methodology, two kinds of comparisons are achieved: 1) applying clustering combined with statistical analysis as a preprocessing step and 2) comparing different classification algorithms: decision tree (ID3), naïve bayes, support vector machines and MDR algorithm.
We obtained the highest accuracy with medical data using CASVM which was 99% with Arrhythmia datasets, compared to three other classifiers: decision tree (C4.5), naïve bayes, and support vector machine classifiers, and we obtained the highest accuracy with microarray data using CASVM which was 95% with Breast cancer datasets, compared to decision tree (C4.5), naïve bayes, Support vector machine and MDR algorithm.