Search In this Thesis
   Search In this Thesis  
العنوان
Natural Language Processing Preprocessing onto Graph-Statistical
Algorithm :
الناشر
Akram A. El Khatib,
المؤلف
.El Khatib, Akram A.
هيئة الاعداد
باحث / Akram A. El Khatib
مشرف / Gamal Mohamed Behery Essa
مشرف / Reda El-Said M. E. Elbarougy
الموضوع
Arabic NLP. Graph Model. PageRank Algorithms.
تاريخ النشر
2020.
عدد الصفحات
116 p. :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
الرياضيات التطبيقية
الناشر
Akram A. El Khatib,
تاريخ الإجازة
1/1/2020
مكان الإجازة
جامعة دمياط - كلية العلوم - الرياضيات /علوم الحاسب
الفهرس
Only 14 pages are availabe for public view

from 171

from 171

Abstract

The text summarization is very important technique for reducing the text amount and retrieve only the important information from the original text. In the graph-based approach first, is to improve one of the currently used algorithms, in this case, the PageRank algorithm (PR) is improved by making it support sentences weighting and nouns measure usage, the second modification on the PageRank algorithm is to change the initial rank of the sentences in the graph to be the number of nouns in the sentence, and that is done by using SAFAR Al-Khalil morphological analyzer to solve the problem of noun extraction. After applying Page Rank algorithm, the sentences are sorted depending on its score then summary is extracted depending on compression ratio. Proposed approach performs efficiently with the number of iteration 10000 then its goes to have no change when the number of iterations is greater than it. In the second approach, the Minimum Spanning Tree algorithms (MST) are applied in Arabic text summarization which is the first time to use it with the Arabic language. Three types of minimum spanning tree algorithms (Prim’s, Kruskal’s, Boruvka’s) were used to get the best one among them. These algorithms were used with three different types of morphological (SAFAR Al-Khalil, BAMA and Stanford NLP) to extract the nouns from the sentences. To investigate which MST algorithms gives the best performance in text summarization.
Finally, to evaluate the performance of this approach Essex Arabic Summaries Corpus (EASC) is used as a standard. It contains 153 documents divided into 10 different subjects. Also three performance metrics were used to measure the performance of the returned summary precision, recall, and F-measure metrics. from the returned results SAFAR- AL-Khalil morphological analyzer returned the best results. The Kruskal’s minimum spanning tree algorithm returned the best results among the other minimum spanning tree algorithms. When removing the stop words from the sentence the results are enhanced. The modified PageRank algorithm outperformance the traditional PageRank algorithm.