Search In this Thesis
   Search In this Thesis  
العنوان
An Enhanced Approach for Arabic Speech Recognition Systems/
المؤلف
Abdelgawad, Mona Abdelazim.
هيئة الاعداد
باحث / منى عبد العظيم عبد الجواد علي
مشرف / نجوى لطفي بدر
مشرف / وداد حسين رياض
تاريخ النشر
2023.
عدد الصفحات
95 p. :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Information Systems
تاريخ الإجازة
1/1/2023
مكان الإجازة
جامعة عين شمس - كلية الحاسبات والمعلومات - نظم المعلومات
الفهرس
Only 14 pages are availabe for public view

from 95

from 95

Abstract

The findings obtained utilizing the CNN-GRU CTC-based model and the character-level sequence-to-sequence model performed better than those of the similar research shown in [ 17, 61, 62, 63, and 64], which mostly depended on employing a language model to produce text transcriptions from audio data. Due to the unstable probabilistic representation of the genuine transcription in the training corpus, the generated text may substitute a false transcription for the correctly predicted one because it depends on the probability of subsequent words in the trained model. Therefore, the language model must first be trained using millions of text transcriptions before being used. The character-level model creates text transcriptions by employing an encoder-decoder architecture to translate the list of characters present in the audio stream to their corresponding text.
The training and validation sets from the corpus were used in the MGB2 experiment to train and validate the deep-speech model. The trained model is examined on a test set. A character-level sequence-to-sequence mode was created using text transcriptions from the training and validation sets. The character set from each text corpus was used to train the model, and word-level text transcriptions were used as the output. Tested using the test set, the total system received a WER of 3.2, competing with all milestones reached using this corpus.
The best WER reported in earlier investigations was 28.48 and was acquired using CNN-LSTM with attention. In Modern Standard Arabic experiments, the suggested model achieved a WER of 4.25 utilizing a CNN-GRU CTC-based model and character-level sequence-to-sequence model for text transcription. The suggested model was developed using MSA datasets, making it challenging to distinguish dialectal Arabic speech. Additionally, because of its prediction technique, the trained character-level sequence-to-sequence model could produce strange phrases (i.e., char-by-char).