Search In this Thesis
   Search In this Thesis  
العنوان
machine learning for e-mail classification /
المؤلف
el seuofi, sherif mohammed shawki.
هيئة الاعداد
باحث / شريف محمد شوقي السيوفي
مشرف / رشيد مختار العوضي أحمد
مشرف / سامي عبد الحفيظ
مشرف / وائل عبد القادر عوض
مناقش / إبراهيم محمود الحناوي
مناقش / محمد محمد عيسي
الموضوع
e-mail classification. machine learning. machine intellgence.
تاريخ النشر
2015.
عدد الصفحات
140 page. :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
الرياضيات الحاسوبية
تاريخ الإجازة
1/1/2015
مكان الإجازة
جامعة بورسعيد - كلية العلوم ببورسعيد - الرياضيات وعلوم الحاسب
الفهرس
Only 14 pages are availabe for public view

from 166

from 166

Abstract

The expanding volume of spam e-mails also known as (unsolicited bulk or junk e-mail ) has produced a need for effective ant-spam filters. Machine learning methods now days used to automatically filter the spam from the broad field of Artificial Intelligence, which aims to simulate intelligent abilities of humans by mavhines. The Machine learning field focuses on the important question of how to make machines able to learn.
Feature selection is an issue of worldwide computing optimization in machine learning in which subsets of relevant feature are chosen to observe powerful learning models. The presence of irrelevant and redundant features in the dataset can result in poor predictions and misclassification process. Thus, selecting relevant feature subsets can help reduce the computational cost of feature measurement, speed up learning process and improve model efficiency. Feature extraction in e-mail classification plays an improtant role. Many Feature extraction algorithms need more effort in term of accuracy in order to improve the classifier accuracy and for faster classification.
Rough sets Method in classification has proven infficient in its ability to produce accurate classification results in the face of large e-mail dataset while also consumes a lot of computational resources.Genetic algorithms (GA) are a part of evolutionary programming, which is a research area that is growing very fast in the field of artificial in intelligence. It iteratively applies a series of genetic operators such as selection, crossover, and mutation to a group of chromosomes where each chromosome represents a solution to a problem.GA is a search procedure inspired by the survival of the fittest principle of natural evolution. The main elments of A population of individuals are represented by feature-encoding chromosomes. Particle Swarm Optimization (PSO) was used in order to optimize the very larg feature space presented from e-mails. Neural network back propagation used in classification as well which has proven a great result.
In this thesis we review some of the most popular machine learning methods (Bayesian classification, K-NN,ANNS, SVMs, Artificial immune system, Fuzzy Logic and Rough sets) and of their applicability to the problem of spam Email classification.Descriptions of the algorithms are presented, and the comparison of their performance on E-mail corpus is presented. Also, we present Genetics Rough Filter (GRF) a hybrid of Genetic Algorithm-Rough set feature selection technique is developed to optimize the Rough set classification parameters, the prediction accuracy and computation time. Spam assassin dataset was used to validate the performance of the proposed system. GRS showedremarkable improvements over Neural Network,Rough set and SVM methods in terms of classification accuracy.
Another hybrid algorithm is proposed called Hybrid Particle Swarm Optimization Neural (HPSON). This hybrid algorithm combines the particle swarm optimization with back propagation neural network algorithm. The proposed system showed a very promising results when it compared to the other algorithms, GRF show more accurate results when it compared to HPSON in term of feature optimization and computational time.
This thesis organized as following chapter1: The Electronic mail Filtering, chapter2: Soft Computing algorithms, Chapter3: Email Classification previous work, Chapter4: E-mail Classification based on Genetic ROUGH Sets Implementation, Chapter5: Particle Swarm Optimization- Neural Back Propagation E-mail Filtering, Chapter6: Systems Performance Evaluation,