Author: Ibrahim,Mokhtar Ashour/ Title: Intelligent Analysis of Textual Content for Spam Detection \

Search In this Thesis

العنوان

Intelligent Analysis of Textual Content for Spam Detection \

المؤلف

Ibrahim,Mokhtar Ashour

هيئة الاعداد

باحث / مختار عاشور ابراهيم خضير

مشرف / محمد واثق على كامل الخراشى

مشرف / شريف رمزي سلامة

مناقش / محسن عبدالرزاق على رشوان

تاريخ النشر

2019

عدد الصفحات

83p.:

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

1/1/2019

مكان الإجازة

جامعة عين شمس - كلية الهندسة - قسم الحاسبات والنظم

الفهرس

Only 14 pages are availabe for public view

from

120

from

120

Abstract

Twitter popularity made it an important and instantaneous source of news and trending events around the world. It has attracted the attention of spammers who post malicious content embedded in tweets and in their profile pages. Spammers use different and evolving techniques to evade traditional security mechanisms, and that creates the need to develop robust solutions that adapt with these techniques. In this thesis, we focus on exploring different natural language processing methods to detect spam from tweets textual content.
One of the models that we propose in this thesis is the character n-gram model, which has an advantage of being robust to spamming techniques that depend on word manipulations. Another set of models we explore, are the word embedding models built with popular word embedding techniques. Finally, we study the character embedding model, which is built using deep learning techniques.
Using publicly available datasets, we evaluate the performance of multiple machine learning classifiers with the proposed models. Our experiments show that the result of some of our character n-gram models is achieving an F-measure of nearly 80%, which is an enhancement over the approaches that use the classical word n-grams from tweet tokens. We also show that our technique can detect spam tweets with low latency which is crucial in a real-time environment like Twitter.