Author: Ashraf,Nsrin./ Title: Rumors detection on social platforms using NLP methods /

Search In this Thesis

العنوان

Rumors detection on social platforms using NLP methods /

المؤلف

Ashraf,Nsrin.

هيئة الاعداد

باحث / Nsrin Ashraf

مشرف / Mohamed Taha

مشرف / Hamada Ali Nayel

مناقش / Islam amer

مناقش / Ahmed shalaby

الموضوع

Machine Learning. Natural Language Processing. Deep Learning. Artificial Neural Network.

تاريخ النشر

2023.

عدد الصفحات

82 p ;

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Science Applications

تاريخ الإجازة

16/7/2023

مكان الإجازة

جامعة بنها - كلية الحاسبات والمعلومات - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Social media platforms have grown rapidly in recent years, with billions of people worldwide
using them for communication, entertainment, and information. Social media development
has dramatically impacted society, affecting how people interact, communicate,
and consume information. While social media has numerous advantages, it has also
prompted worries about privacy, misinformation, and the influence on mental health,
especially among young people. The dissemination of rumors has been significantly impacted
by social media platforms. The major platform that has been used for spreading
news regarding the Covid-19 pandemic is Twitter. The Covid-19 pandemic has spread a
considerable deal of false material on social media. Artificial intelligence proposed several
methods to relieve the spread of fake news.
In this study, we proposed a model that can discriminate between “fake” and “true”
news tweets capable of working with any up-to-date problem. To address this issue, this
research explored various learning approaches to detect fake news. We compare different
deep learning and machine learning methods for fake news detection, such as CNN,
LSTM, Na¨ıve Bayes, and Support Vector Machine. The efficiency of these models was
evaluated on benchmark datasets and self-collected dataset. This research aims to improve
the model used in classifying rumors by utilizing various techniques for text representation
such as Word Embedding and TF-IDF. It involves extracting the underlying
meanings in texts by searching for semantic relationships between words, phrases, and
texts. These processes help in analyzing and understanding texts. The efficiency of these
models was tested by training data on a set of tweets. New tweets were collected using
Snscrape to track different writing methods and build a model capable of detecting errors
with all the changes that occur in a word and returning to the origin of the word. The results
of the first model using TF-IDF algorithms and machine learning algorithms showed
the superiority of Multi-Layer Perceptron algorithm, achieving an accuracy of 93.8% and
an F-score of 93.6% when applied to the English language. The results of the Arabic language
models showed the superiority of the Support Vector Machine algorithm, achieving
an accuracy of 82.90%, while the K-Nearest Neighbor achieved better results with
iv
an F-score of 57.5%. The results showed the superiority of Uni-gram text vectorization
over Bi-gram. GloVe word embedding was used with deep learning algorithms to improve
text understanding and discover relationships between words. Recurrent neural
networks achieved the best results for the English language with an accuracy of 99%, but
the ensemble learning model achieved better results in terms of F-score achieved 97%.
The Convolutional Neural Network achieved the best results with the Arabic language
achieved an accuracy of 83% using the Accuracy measure, while the Ensemble learning
model achieved better results using the F-score at a rate of 81.7%.
The second step was to test the model on a new test set that had not been tested
before. A significant decline of about 25% was found in the English language model,
achieving an accuracy of 74%. The experiments showed that adding some modifications
to the evidence processing stage to develop the model made it capable of dealing with all
the changes that occur in a word and showed an improvement of about 8% achieving an
accuracy of 83%.
As for the proposed model for the Arabic language, there was a decline of about 5%,
achieving an accuracy of 70%. The results vary between deep learning models, but the
BI-LSTM showed the difference between the differences in the data. With some modifications
to the word processing stage to develop the model and make it capable of dealing
with all the changes that occur in a word, there was an improvement of about 8% achieving
an accuracy of 78%.