Search In this Thesis
   Search In this Thesis  
العنوان
Application of Data Mining Technique in Medical Diagnosis \
المؤلف
El-Rashidy, Mohamed Ahmed Abd El-Hamid.
هيئة الاعداد
باحث / Mohamed Ahmed Abd El-Hamid
مشرف / Taha E. Taha
مشرف / Nabil M. A. Ayad
مشرف / Hoda S. Sorour
الموضوع
Data mining. Diagnosis, Medical History.
تاريخ النشر
2012.
عدد الصفحات
177 p. :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
3/6/2012
مكان الإجازة
جامعة المنوفية - كلية الهندسة الإلكترونية - Computer Science and Engineering
الفهرس
Only 14 pages are availabe for public view

from 200

from 200

Abstract

Healthcare organizations are facing a major challenge of the patient diagnosis correctly and administering treatments that are effective. This challenge is related to the multiplicity of types of diseases that makes the diagnostic process more complex, especially if the symptoms and the results of the investigations indicated to these types are several and similar. Data mining techniques can be used to generate rules or identify patterns from medical data to assist clinical diagnosis and decision making. Recent advances in the data mining field have led to increase concerns about privacy of the involved parties. A number of techniques such as randomization and k-anonymity have been suggested in recent years in order to perform privacy preserving data mining. These techniques use some form of transformation on the data in order to reduce the granularity of representation that reduce the privacy. This reduction in granularity results in some loss of effectiveness of data mining results.
In this thesis, effective approaches that used the idea of clustering for enforcing the k-anonymity are proposed. The main goal of these approaches is preserved data and achieved less possible effectiveness on accuracy of data mining results. A new hybrid data mining model is also proposed to provide a comprehensive analytic method for finding an optimal number of different types of any disease, an optimal partitioning representative, and extracts the subset of features for each type of disease. It reflects the diversity of the disease types. This model has been called an Optimal Clustering for Support Feature Machine (OCSFM). It is an integration of both characteristics of supervised and unsupervised models, and it is based on clustering, feature selection, and classification concepts. It uses fuzzy clustering, max-min, and
support feature machine models that employ advances in classification of medical data. Similarities and multiplicities of features that extracted for each of different types of disease may render the possibility of convergence, and lead to random classification decisions. Therefore, an optimization has been proposed to derive enhancement in feature selection process for the OCSFM model. This optimization is based on hybrid feature selection model that based on filter and wrapper methods, it used averaging schema as a filter method, and sequential backward search as a wrapper method. The goal of this optimization is to extract the subset of fewer features of each type, and access the highest diagnostic accuracy in less time without confusion or ambiguity between the different variations of the diseases. This vital knowledge provides the efficiency of treatment service and avoids poor treatments. The disappearance of this knowledge can lead to disastrous consequences, unwanted biases, errors and excessive medical costs which affect the quality of service that are provided to patients.
A set of experiments have been conducted on the datasets of the Urology and Nephrology Center, Mansoura, Egypt and UC Irvine machine learning repository to evaluate the proposed methods that enforce k-anonymity model. The obtained results show that, the proposed methods keep data privacy preservation with very low effect on accuracy of data mining results compared with the greedy k-member and one pass k-means algorithms. Another set of extensive experiments have been carried out to evaluate the OCSFM and the optimized features of it, and conducted on Wisconsin breast cancer and Cleveland heart disease which are commonly used among researchers who use machine learning methods for breast cancer and heart diseases diagnosis, and it also conducted on surgical patient’s datasets to diagnosis post-operative infections. The results show that the highest classification performance is obtained using optimized OCSFM model, and this is very promising compared to NaïveBayes, linear and polykernal support vector machine, artificial neural network, and support feature machine models.