Author: Mohamed Talaat Saad Farrag/ Title: Long - distance continuous language modeling for speech recognition /

Search In this Thesis

العنوان

Long - distance continuous language modeling for speech recognition /

الناشر

Mohamed Talaat Saad Farrag ,

المؤلف

Mohamed Talaat Saad Farrag

هيئة الاعداد

باحث / Mohamed Talaat Saad Farrag

مشرف / Mahmoud Ismail Shoman

مشرف / Sherif Mahdy Abdou

مشرف / Mahmoud Ismail Shoman

تاريخ النشر

2015

عدد الصفحات

61 Leaves :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

11/5/2015

مكان الإجازة

جامعة القاهرة - كلية الحاسبات و المعلومات - INFORMATION TECHNOLOGY

الفهرس

Only 14 pages are availabe for public view

from

Abstract

This thesis deals with the problem of building continuous language models for automatic continuous speech recognition systems. The n-gram language models has been the most frequently used language model for a long time as they are easy to build models and require the minimum effort for integration in different NLP applications. Although of its popularity, n - gram models suffer from several drawbacks such as its ability to generalize for the unseen words in the training data, the adaptability to new domains, and the focus only on short distance word relations. To overcome the problems of the n-gram models the continuous parameter space LMs were introduced. In these models the words are treated as vectors of real numbers rather than of discrete entities. As a result, semantic relationships between the words could be quantified and can be integrated into the model. The infrequent words are modeled using the more frequent ones that are semantically similar. In this study we present a long distance continuous language model based on a latent semantic analysis LSA. In the LSA framework, the word-document co - occurrence matrix is commonly used to tell how many times a word occurs in a certain document