Search In this Thesis
   Search In this Thesis  
العنوان
Duplicate Records Detection in Database /
المؤلف
Higazy, Azza Abdal Elah.
هيئة الاعداد
باحث / عزة عبد الاله حجازى
مشرف / امانى محمود سرحان
مشرف / طارق الاحمدى عبد العزيز
مشرف / لا يوجد
الموضوع
Computer and Control Engineering.
تاريخ النشر
2015.
عدد الصفحات
p 113. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الهندسة الكهربائية والالكترونية
تاريخ الإجازة
1/1/2015
مكان الإجازة
جامعة طنطا - كلية الهندسه - حاسبات والتحكم الالى
الفهرس
Only 14 pages are availabe for public view

from 32

from 32

Abstract

Sharing data between organizations has growing importance in many data mining projects.Data from various heterogeneous sources often has to be linked and aggregated in order to improve data quality. For example, in higher education sector, this includes the linking of scholar data from citation databases or e-Learning initiatives participants’ database to the management information system. In the health sector,information retrieved from linked data is used to improve health policies. The importance of data accuracy and quality has increased with the explosion of data size. This factor is crucial to ensure the success of any cross-enterprise integration applications, business intelligence or data mining solutions. The first step to ensure the data accuracy is to make sure that each real world object is represented once and only once in a certain dataset which called Duplicate Record Detection (DRD). This operation becomes more complicated when entities are identified by a string value like person names. These data inaccuracy problems exist due to due to several factors including spelling, typographical, pronunciation variation, dialects and special vowel and consonant distinction and other linguistic characteristics especially with non-Latin languages like Arabic. The previously proposed DRD algorithms and frameworks do not support bi language duplicate detection and don’t work properly with non-Lain languages such as Arabic. They also have some configuration difficulties because of the technology used to build them. The wide range of variations especially in Arabic data requires the system user to be expert in data field, thus we assume that framework user is a subject matter expert (SME).In this thesis, an English/Arabic enabled web-based framework is designed and implemented which considers the user interaction to add new rules,enrich the dictionary and evaluate results is an important step to improve system’s behavior