Author: By Mohamed Mohamed Fathi Amin,/ Title: Sign Language Translation using Deep Nets /

Search In this Thesis

العنوان

Sign Language Translation using Deep Nets /

المؤلف

By Mohamed Mohamed Fathi Amin,

هيئة الاعداد

باحث / Mohamed Mohamed Fathi Amin

مشرف / Ammar Mohammed

مشرف / Hesham A. Hefny

مشرف / Ammar Mohammed

الموضوع

neural machine translation

تاريخ النشر

2022.

عدد الصفحات

71 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

7/6/2022

مكان الإجازة

جامعة القاهرة - المكتبة المركزية - Computer Science

الفهرس

Only 14 pages are availabe for public view

from

Abstract

One of the more recent developments in the machine learning field is the translation of sign
language into a form of natural language. The division of sign language into gesture and facial
recognition has been a major area of research. However, these efforts, ignore the linguistic
structure and the context of natural sentences. In order to close this gap, conventional
techniques including statistical, rule-based, and example-based approaches have been
employed in sign language translation. These approaches take a lot of time, have poor
translation quality, and have limited scaling capabilities in their underlying models.
Machine translation has recently demonstrated significant performance for deep neural
networks. Since sign language is expressed as gloss notations, few researchers have
suggested using deep neural networks to translate sign language into textual natural
languages. However, these efforts ignore the other direction of the translation. The
contribution of this thesis is twofold. First, it proposes a deep learning approach for
bidirectional translation. In particular, the thesis introduces two deep learning models using
GRU and LSTM for each direction of the translation. In each of the proposed models,
اإلداره العامه جامعه القــــــــاهره
للدراسات العليا والبحوث
Bahdanau and Luong’s attention mechanisms are used. Second, the thesis tests the
proposed models using the ASLG-PC12 and Phoenix-2014T sign language corpora. The
experiment with 16 models shows that the proposed model performs better than other earlier
studies using GRU and LSTM on the same corpus. The results of translating from text to
gloss using the ASLG-PC12 corpus using the sequence-to-sequence model show that the
GRU model with Bahdanau attention achieves the best results, scoring 94.37 % on
ROUGE score (Recall-Oriented Understudy for Gisting Evaluation) and 83.98% on
BLEU-4 score (Bilingual Evaluation Understudy). The GRU model with Bahdanau
attention also gives the best results when translating from gloss to text, with ROUGE
scores of 87.31% and BLEU-4 scores of 66.59%. The results on the ASLG-PC12 corpus
using transformer model, when translating from text to gloss, reveal that the model with
two layers gives the best result with ROUGE score 98.78% and BLEU-4 score 96.89%.
When translating from gloss to text, the results also show that the model with two layers
achieves the best result with ROUGE score 96.90% and BLEU-4 84.82%. Results of text-togloss translation on the Phoenix-2014T corpus show that the GRU model with Bahdanau
attention achieves the best results in ROUGE and BLEU4 with scores of 42.96% and 10.53%,
respectively. When translating from gloss to text, The GRU model with Luong attention
achieves the best result in ROUGE with a score of 45.69% and BLEU4 with a score of
19.56%. The results on the Phoenix-2014T corpus using transformer model, when
translating from text to gloss, reveal that the model with two layers gives the best result
with ROUGE score 48.80% and BLEU-4 score 15.78%. When translating from gloss to
text, the results also show that the model with two layers achieves the best result with
ROUGE score 49.33% and BLEU-4 25.29%.