Author: Gamal Eldeen, Eman Mamdoh./ Title: Automatic Classification of Web Pages with Data Mining Techniques /

Search In this Thesis

العنوان

Automatic Classification of Web Pages with Data Mining Techniques /

المؤلف

Gamal Eldeen, Eman Mamdoh.

هيئة الاعداد

باحث / Eman Mamdoh Gmal Eldeen Younis

مشرف / Ibrahim F. Abd EI-Rahmaan

مشرف / Ibrahim Z. Morsy

مناقش / Nabil A. Ismail

الموضوع

Data mining.

تاريخ النشر

2007.

عدد الصفحات

114 p ;

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

علوم الحاسب الآلي

تاريخ الإجازة

1/1/2005

مكان الإجازة

جامعة المنوفية - كلية الحاسبات والمعلومات - تكنولوجيا المعلومات

الفهرس

Only 14 pages are availabe for public view

from

128

from

128

Abstract

Web page classification is a very important topic today; this is due to the increasing volume of data available on the World Wide Web and the heterogeneity in the formats of the data. The web is growing very fast; it has a very large amount of information from different types. This necessitates the need for ways to arrange and organize this vast amount of data. One of this ways is automatic Web page classification; that is used in many other applications. Also, there exists a need for ways to manage and extract important knowledge from the web and to facilitate indexing and searching. This thesis proposes a method for extracting features of web pages using WEKA, it is a data- mining and machine learning tool developed in University of Waikato. The resulting features are used for building an automatic web page classification system with a specific number of categories based on different web page classification criteria (page text, page title, Header and URL) with two classification algorithms (Classification Rules and word-weight). In this thesis, the comparison among various page structural elements, which are used in the Web page classification task, has been presented. The classification rules and word-weight algorithms have been used for the significance criteria limitation of page structural elements in web page classification. The obtained results showed that, the page title proved its significance; giving better accuracy over all categories, whereas it’s ranged between 84.69% and 93.85% with average 90.52%. finally, the results also proved that the word-weight algorithm has improved the accuracy than the classification rules algorithm with all other classification criteria(Title, Body text, Header and URL)