Search In this Thesis
   Search In this Thesis  
العنوان
Automatic Text Summarization using Natural Language Processing and Artificial Intelligence Techniques\
المؤلف
El-Kassas,Wafaa Samy Abdul-Hamed
هيئة الاعداد
باحث / وفاء سامى عبد الحميد القصاص
مشرف / هدى قرشى محمد
مشرف / أحمد عبد الواحد رافع
مناقش / محسن عبد الرازق رشوان
تاريخ النشر
2020.
عدد الصفحات
153p.:
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
الهندسة الكهربائية والالكترونية
تاريخ الإجازة
1/1/2020
مكان الإجازة
جامعة عين شمس - كلية الهندسة - كهرباء حاسبات
الفهرس
Only 14 pages are availabe for public view

from 190

from 190

Abstract

The Internet has an exponentially increasing amount of textual data. Searching for a certain topic can become a daunting task because users cannot read and comprehend all potentially long documents in the search results. As a result, it becomes urgent to help users by summarizing textual content. Manual text summarization consumes a lot of time, effort, cost, and even becomes impractical with the gigantic amount of textual content. Therefore, Automatic Text Summarization (ATS) in this case is clearly beneficial. Researchers have been trying to improve ATS techniques since the 1950s. ATS approaches are either extractive, abstractive, or hybrid. The extractive approach selects the most important sentences in the input document(s) then concatenates them to form the summary. The abstractive approach represents the input document(s) in an intermediate representation then generates the summary with sentences that are different than the original sentences. The hybrid approach merges between both the extractive and abstractive approaches. This thesis provides a comprehensive survey for the researchers by presenting the different aspects of ATS: approaches, building blocks, techniques, evaluation methods, and future research directions. Despite all the proposed methods in the literature, the generated summaries are still far away from the human-generated summaries. To enhance ATS for single documents, this thesis also proposes a novel extractive graph-based framework “EdgeSumm” that relies on four proposed algorithms. The first algorithm constructs a new text graph representation model from the input document. The second and third algorithms search the constructed text graph for sentences to be included in the candidate summary. When the number of words of the resulting candidate summary still exceeds a user-required length limit, the fourth algorithm is used to select the most important sentences then add them to the final summary. EdgeSumm combines a set of extractive ATS methods (namely graph-based, statistical-based, semantic-based, and centrality-based methods) to benefit from their advantages and overcome their individual drawbacks. EdgeSumm is general for any document genre (not limited to a specific domain) and unsupervised so it does not require any training data. The standard datasets DUC2001 and DUC2002 are used to evaluate EdgeSumm using the widely used automatic evaluation tool: Recall-Oriented Understudy for Gisting Evaluation (ROUGE). EdgeSumm gets the highest ROUGE scores on DUC2001. For DUC2002, the evaluation results show that the proposed framework outperforms the state-of-the-art ATS systems by achieving improvements of 1.2% and 4.7% over the highest scores in the literature for the metrics of ROUGE-1 and ROUGE-L respectively. In addition, EdgeSumm achieves very competitive results for the metrics of ROUGE-2 and ROUGE-SU4.