Search In this Thesis
   Search In this Thesis  
العنوان
Extracting uncertain first events in twitter using distributed event based system /
المؤلف
Ahmed, Samar Mohamed Handousa.
هيئة الاعداد
باحث / سمر محمد حندوسة أحمد البدويهي
مشرف / طاهر توفيق حمزة
مشرف / محمد الرحماوى
مناقش / مصطفى محمود
مناقش / مجدى زكريا
الموضوع
Machine Learning. Big Data. Natural Language Processing.
تاريخ النشر
2018.
عدد الصفحات
124 P. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science Applications
تاريخ الإجازة
01/09/2018
مكان الإجازة
جامعة المنصورة - كلية الحاسبات والمعلومات - Department of computer Science
الفهرس
Only 14 pages are availabe for public view

from 124

from 124

Abstract

The process of analyzing Tweets to extract useful information in real time is a big data problem, due to the huge number of tweets produced continuously by the huge number of users. This process faces many challenges to get the required results with high accuracy in real time. FSD is one of these processes as it aims at the detection of the tweets that included the first story in a certain stream of tweets. FSD systems require distributed real-time platform to gain the benefits of high degree of parallelism and guarantee real-time execution. Storm is an open source for distributed real-time stream processing; hence, it provides a flexible, scalable platform to implement high performance FSD systems. Some existing FSD systems are built over Storm; most of these systems measure the similarities among tweets using the traditional TF-IDF; however this algorithm has its limitations. In order to enhance accuracy of such systems, we replace the TF-IDF word embedding stage by other more efficient alternative methods; mTF-IDF, which is an enhanced version of the TF-IDF and char2vec or FastText model. Our empirical results show that mTF-IDF makes significance enhancements in the accuracy of detection results without affecting the performance noticeably. FastText make the result better than mTF-IDF in accuracy.