Author: Botros, Maged Magdy Sobhy./ Title: Improving Mining Association Rules in Large Databases/

Search In this Thesis

العنوان

Improving Mining Association Rules in Large Databases/

المؤلف

Botros, Maged Magdy Sobhy.

هيئة الاعداد

باحث / ماجد مجدى صبحى بطرس

مشرف / فايد فائق محمد غالب

مشرف / وائل زكريا عبدالله

مشرف / دولت عبدالعزيز محمد

تاريخ النشر

2023.

عدد الصفحات

114 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

1/1/2023

مكان الإجازة

جامعة عين شمس - كلية العلوم - الرياضيات

الفهرس

Only 14 pages are availabe for public view

from

113

from

113

Abstract

Most modern applications rely on collecting huge amounts of data continuously to analyze it and extract hidden information from it that benefits decision-makers. Therefore, processing and analyzing this huge amount of data requires finding effective and accurate ways to deal with it and any incremental data. An example of hidden information is the extraction of association rules mining (ARM) between itemsets in huge data, which is one of the important tasks in data mining. One of the time- and memory-intensive phases of ARM that is most challenging and complicated is frequent itemsets mining (FIM). The issue also gets worse when the volume of data keeps growing, which is a fundamental aspect of all applications. Therefore, the traditional methods are completely ineffective in dealing with everincreasing datasets, as it is not possible to extract all the frequent itemsets in an efficient manner. As a result of these difficulties, approximate algorithms have been developed to extract the frequent itemsets. Although the approximation techniques can be used in the real world, their usage with massive data requires further improvement.
viii
The objective of this thesis is to develop an approximation algorithm capable of extracting frequent itemsets effectively from massive incremental datasets. Therefore, the Closed Candidates-based Incremental Frequent Itemset Mining (CC − IFIM) approach is introduced. This developed approach achieves the following results:
1. Extracting an approximated FI that effectively covers most of the exact FI.
2. Effectively handling incremental datasets and updating existing FI without having to re-mining the entire dataset.
This thesis, which may be summed up as having five chapters, is organized as follows:
Chapter 1 (Introduction): introduces the motivations, objectives, and our contribution.
Chapter 2 (Frequent itemsets mining (FIM) algorithms): studies the basic concepts and problem statement of ARM and FIM, in which the wellknown algorithms such as Apriori, FP-growth, and their extensions are presented.
Chapter 3 (Incremental frequent itemsets mining (IFIM) approaches): discusses the disadvantages of the FIM algorithms mentioned in Chapter 2, as they are static algorithms that are impractical since they rely on re-mining the entire dataset (the union of the original data DO and the incremental data DI without considering the previously extracted FI from DO). In this chapter, two approaches are explored and discussed in detail: the exact approach that extracts all the FI from the datasets, and the approximate approach that extracts the approximate FI (FIapprox) from the dataset. In addition, it discusses the existing approach named incremental frequent itemsets mining based on frequent pattern tree and multi-scale (FPMSIM), which is the one we developed to be more efficient and ef-
fective.
Chapter 4 (CC − IFIM: an efficient approach for incremental frequent itemset mining based on closed candidates ): presents the proposed CC− IFIM approach in addition to presenting and discussing the results of experiments results conducted on five diverse datasets, which showed the effectiveness of the proposed approach.
Chapter 5 (Conclusion and future work): concludes the thesis and suggests some future work.