Search In this Thesis
   Search In this Thesis  
العنوان
An Enhanced Approach for Privacy Preserving Data
Mining /
المؤلف
Hajar Hussain Mohammed Redha ,
هيئة الاعداد
باحث / Hajar Hussain Mohammed Redha
مشرف / Hesham Ahmed Hefny
مشرف / Ahmed Mohammed Gadallah
مشرف / Hesham Ahmed Hefny
الموضوع
Computer <br>Sciences
تاريخ النشر
2022.
عدد الصفحات
212 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science Applications
تاريخ الإجازة
29/4/2022
مكان الإجازة
جامعة القاهرة - المكتبة المركزية - Computer Science
الفهرس
Only 14 pages are availabe for public view

from 212

from 212

Abstract

Data inflation, the eminent consequence of the latest rapid development and diversity of sources in the field of data and information, has led to the continuous exploration of means of benefiting from this huge amount of incremental data in different fields. Also, the data mining concept refers to discovering the knowledge in a huge set of data if it is collected and analyzed efficiently, it will help make sound and appropriate decisions, solving many problems and resulting in organizations’ development. Thus, it became necessary to ensure the privacy of those private and sensitive data of great value in the digital world in order to help provide better and high-quality services without data loss or breach. Yet, there is still a need for more flexible approaches to allow Privacy Preserving while applying Data Mining techniques. This thesis proposes an enhanced approach for privacy-preserving data mining techniques in any environment and thoroughly outlines efficient solutions for everyday problems faced by traditional privacy-preserving data miming techniques. This includes normalization, categorization, discretization and substituting Quasi- attributes by its dependent data, in which this makes sure that data is entirely private with decreasing the loss of available information. The proposed approach has been tested on data from a descriptive correlational design, used with a convenience sample of Lebanese adults with type 2 diabetes recruited from a major hospital in Beirut, Lebanon. In addition, gathering data set from a questionnaire was created using the Google Form Model. It targets diabetic patients in Kuwait and Egypt, taking into account the existence of the same primary fields found in diabetes data in Lebanon. The results show the added value of the proposed approach against other works in respect to the more suitable to privacy-preserving data mining. The proposed approach succeeded with accuracy of 0.849 ≈ 0.85, equivalent to 85%. Thus, it provided less information loss and increased privacy for sensitive data compared to previous work in which the loss ratio was ≈ 0.73. At the same time, the proposed approach reduced information loss. The original data set includes multiple attributes, which uniquely identifies an individual. After removing sensitive attributes, the quasi-identifiers attributes in the new dataset will be replaced with related or equivalent attributes. After applying the proposed approach, the final result of the sensitive attributes is sanitized and cannot be disclosed or breached