Search In this Thesis
   Search In this Thesis  
العنوان
Managing Communication Costs of Mining in Distributed Data Systems/
الناشر
Fac.of Eng.Dep.of Computer Science,
المؤلف
Elteir,Marwa Khamis Mohamed Hessin.
هيئة الاعداد
باحث / مروه خميس محمد حسن الطير
مشرف / خليل محمد أحمد
مشرف / أحمد عبد الرافع بلال
aabelal@yahoo.com
مشرف / علاء الدين مختار حافظ
Ahafez2001@yahoo.com
مناقش / محمود سعيد أبو جبل
msabougabal@yahoo.com
مناقش / صالح الشهابى
الموضوع
Data Communications Systems.
تاريخ النشر
2005
عدد الصفحات
x, 55 P.:
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الهندسة (متفرقات)
تاريخ الإجازة
1/1/2005
مكان الإجازة
جامعة الاسكندريه - كلية الهندسة - هندسة الحاسب والنظم
الفهرس
Only 14 pages are availabe for public view

from 52

from 52

Abstract

Advances in computing and communication over wired and wireless networks have resulted in many pervasive distributed computing environments. Many of these environments have different distributed sources of huge data and multiple compute nodes. Analyzing and monitoring these distributed data sources require data mining technology designed for distributed applications.
Within the area of data mining, the problem of mining association rules has a big share in the data mining research. This is attributed to its wide range of applications. The key challenge encountered discovering association rules in the distributed environments is the minimization of the overall communication cost so that generating global association rules costs less than combining the participating sites’ data sources into a centralized site.
Few works has been done in this area. Additionally, no one of the existing algorithm can be scalable with the number of nodes and also resilient to the data skewness and imbalanced partition sizes.
In this thesis, several algorithms are proposed for association rule mining in distributed environments. The intended distributed environments are assumed to be broadcasting networks. The basic idea is based on the observation that the excess in the already broadcasted local support counts for any itemset can be utilized to relax the minimum support constraint. Consequently only the nodes that satisfy this relaxed constraint can broadcast their counts. If no node broadcasts its local support count, the itemset is considered directly as a small itemset. For further reduction in the overall communication cost not only the cost related to globally small itemsets, any itemset can be considered as a globally large itemset if the relaxed minimum support reaches to a ccrtain practically tolerated level a, without waiting for more local support counts.
An event-driven simulator is built for the performance evaluation. The tool available at http ://www.almaden. ibm. comlsoftware/g uest is used for synthetic database generation. The performance of the new algorithms is compared to Distributed Decision Miner (DDM) algorithm.
The simulation results generally have shown that for lightly skewed partitions, the basic algorithm achieves significant performance enhancement. And for highly skewed partitions, the performance enhancement reaches to 70%. In addition, it scales better with the number of nodes and it is also more resilient to data skewness, imbalanced partition sizes and message ordering than DDM.