Author: Hajar Hussain Mohammed Redha ,/ Title: An Enhanced Approach for Privacy Preserving Data <br>Mining /

Search In this Thesis

العنوان

An Enhanced Approach for Privacy Preserving Data
Mining /

المؤلف

Hajar Hussain Mohammed Redha ,

هيئة الاعداد

باحث / Hajar Hussain Mohammed Redha

مشرف / Hesham Ahmed Hefny

مشرف / Ahmed Mohammed Gadallah

مشرف / Hesham Ahmed Hefny

الموضوع

Computer <br>Sciences

تاريخ النشر

2022.

عدد الصفحات

212 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Science Applications

تاريخ الإجازة

29/4/2022

مكان الإجازة

جامعة القاهرة - المكتبة المركزية - Computer Science

الفهرس

Only 14 pages are availabe for public view

from

212

from

212

Abstract

Data inflation, the eminent consequence of the latest rapid development and diversity of sources in the field of data and information, has led to the continuous exploration of means of benefiting from this huge amount of incremental data in different fields. Also, the data mining concept refers to discovering the knowledge in a huge set of data if it is collected and analyzed efficiently, it will help make sound and appropriate decisions, solving many problems and resulting in organizations’ development. Thus, it became necessary to ensure the privacy of those private and sensitive data of great value in the digital world in order to help provide better and high-quality services without data loss or breach. Yet, there is still a need for more flexible approaches to allow Privacy Preserving while applying Data Mining techniques. This thesis proposes an enhanced approach for privacy-preserving data mining techniques in any environment and thoroughly outlines efficient solutions for everyday problems faced by traditional privacy-preserving data miming techniques. This includes normalization, categorization, discretization and substituting Quasi- attributes by its dependent data, in which this makes sure that data is entirely private with decreasing the loss of available information. The proposed approach has been tested on data from a descriptive correlational design, used with a convenience sample of Lebanese adults with type 2 diabetes recruited from a major hospital in Beirut, Lebanon. In addition, gathering data set from a questionnaire was created using the Google Form Model. It targets diabetic patients in Kuwait and Egypt, taking into account the existence of the same primary fields found in diabetes data in Lebanon. The results show the added value of the proposed approach against other works in respect to the more suitable to privacy-preserving data mining. The proposed approach succeeded with accuracy of 0.849 ≈ 0.85, equivalent to 85%. Thus, it provided less information loss and increased privacy for sensitive data compared to previous work in which the loss ratio was ≈ 0.73. At the same time, the proposed approach reduced information loss. The original data set includes multiple attributes, which uniquely identifies an individual. After removing sensitive attributes, the quasi-identifiers attributes in the new dataset will be replaced with related or equivalent attributes. After applying the proposed approach, the final result of the sensitive attributes is sanitized and cannot be disclosed or breached