Search In this Thesis
   Search In this Thesis  
العنوان
Semantic textual similarity impact on NLP applications /
الناشر
Basma Hassan Kamal Hussein ,
المؤلف
Basma Hassan Kamal Hussein
هيئة الاعداد
باحث / Basma Hassan Kamal Hussein
مشرف / Ibrahim Farag Abdelrahman
مشرف / Reem Mohamed Reda Bahgat
مشرف / Aly Aly Fahmy
تاريخ النشر
2020
عدد الصفحات
94 Leaves :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
10/2/2020
مكان الإجازة
جامعة القاهرة - كلية الحاسبات و المعلومات - Computer Science
الفهرس
Only 14 pages are availabe for public view

from 115

from 115

Abstract

Human has an intrinsic ability to recognize the degree of similarity and difference between texts. Simulating the process of human judgment in computers is still an extremely difficult task. Semantic Textual Similarity (STS) is the task of assessing the degree to which two short texts are similar to each other in terms of meaning. Many natural language processing (NLP) applications rely on assessing the semantic similarity of text segments as a core component to achieve their goals; such as information retrieval, machine translation evaluation, automatic short answer grading, paraphrase identification, recognizing textual entailment, and others. An infinite number of meaningful sentences can be generated in any natural language. Hence, short texts present many challenges in NLP, unlike words and documents. Despite the shortness of a sentence, it can accommodate the most complex forms of human expression. Some pairs of sentences may represent the same meaning, even though there are few matching words between them, while other pairs may have totally different meanings; however, a high word overlap occurs between them. Several approaches have been proposed in the literature to determine the semantic similarity between short texts. The majority of the STS approaches presented recently were supervised approaches, where a machine learning or deep learning technique used with feature engineering. Unsupervised STS approaches are presented as well as a single similarity measure, which are characterized by the fact that they do not require learning data, but they still suffer from some limitations