Search In this Thesis
   Search In this Thesis  
العنوان
Acronyms expansion disambiguation and their effect on NLP tasks /
الناشر
Akram Gaballah Ahmed Almatarky ,
المؤلف
Akram Gaballah Ahmed Almatarky
هيئة الاعداد
باحث / Akram Gaballah Ahmed Almatarky
مشرف / Amr Ahmed Badr
مشرف / Emad Nabil Hassan
مشرف / Amr Ahmed Badr
تاريخ النشر
2016
عدد الصفحات
113 P. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
9/3/2016
مكان الإجازة
جامعة القاهرة - كلية الحاسبات و المعلومات - Computer Science
الفهرس
Only 14 pages are availabe for public view

from 136

from 136

Abstract

Nonstandard words such as proper nouns, abbreviations, and acronyms are a major obstacle in natural language text processing and information retrieval. Acronyms, in particular, are difficult to read and process because they are often domain specific with high degree of polysemy. In this work, we propose a language modeling approach for the automatic disambiguation of acronym senses using context information. First, a dictionary of all possible expansions of acronyms is generated automatically. The dictionary is used to search for all possible expansions or senses to expand a given acronym. The extracted dictionary consists of about 17 thousands acronym-expansion pairs defining 1,829 expansions from different fields where the average number of expansions per acronym was 9.47. Training data is automatically collected from downloaded documents identified from the results of search engine queries. The collected data is used to build a language model that models the context of each candidate expansion. The expansion context were filtered and retained only the terms that produces the highest information gain. Expansions from different acronyms were grouped together based on the similarity between their contexts. At the in-context expansion prediction phase, the relevance of acronym expansion candidates is calculated based on the similarity between the context of each specific acronym occurrence and the language model of each candidate expansion. Unlike other work in the literature, our approach has the option to reject to expand an acronym if it is not confident on disambiguation