Author: Elbehiry, Heba Ebrahim Elsaid./ Title: Focused Crawling Algorithm Enhancement /

Search In this Thesis

العنوان

Focused Crawling Algorithm Enhancement /

المؤلف

Elbehiry, Heba Ebrahim Elsaid.

هيئة الاعداد

مشرف / هبه ابراهيم السيد البحيري

مشرف / اماني محمود سرحان

مشرف / محمد طلعت فهيم سيد احمد

مشرف / نوال احمد الفيشاوي

الموضوع

Computers and Control Engineering.

تاريخ النشر

2016.

عدد الصفحات

p 140. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

21/2/2016

مكان الإجازة

جامعة طنطا - كلية الهندسه - Computers and Control Engineering

الفهرس

Only 14 pages are availabe for public view

from

Abstract

The fast growth of the information on the Web has brought up some problems in the search process. One of these problems is that the general purpose search engines often return too many irrelevant results when users are searching for specific information on a given topic. Another problem is the massive increase in the number of pages to be indexed by Web search systems. Web crawling is the process used by search engine to collect pages from the web This thesis is concerned with enhancing the quality of the retrieved pages which contain the most relevant information for the users. In order to do that , we introduce two directions to work on First, we start by reducing the number of training pages used by classifier. This is achieved by introducing a proposed algorithm of feature selection, which uses the Document Frequency technique (DF) for the term in the category. Second step is Web page classification. Two famous techniques of Web page classification are used: (i) Support Vector Machine (SVM) [with linear and nonlinear methods], and (ii) Naive Bayes Classifier (NBC). The proposed algorithm, using DF, reduces the redundancy during feature selection and increases the accuracy during Web page classification. We argue that this method will guarantee more consistent set of training pages than the traditional algorithms.