الفهرس | Only 14 pages are availabe for public view |
Abstract One of the more recent developments in the machine learning field is the translation of sign language into a form of natural language. The division of sign language into gesture and facial recognition has been a major area of research. However, these efforts, ignore the linguistic structure and the context of natural sentences. In order to close this gap, conventional techniques including statistical, rule-based, and example-based approaches have been employed in sign language translation. These approaches take a lot of time, have poor translation quality, and have limited scaling capabilities in their underlying models. Machine translation has recently demonstrated significant performance for deep neural networks. Since sign language is expressed as gloss notations, few researchers have suggested using deep neural networks to translate sign language into textual natural languages. However, these efforts ignore the other direction of the translation. The contribution of this thesis is twofold. First, it proposes a deep learning approach for bidirectional translation. In particular, the thesis introduces two deep learning models using GRU and LSTM for each direction of the translation. In each of the proposed models, اإلداره العامه جامعه القــــــــاهره للدراسات العليا والبحوث Bahdanau and Luong’s attention mechanisms are used. Second, the thesis tests the proposed models using the ASLG-PC12 and Phoenix-2014T sign language corpora. The experiment with 16 models shows that the proposed model performs better than other earlier studies using GRU and LSTM on the same corpus. The results of translating from text to gloss using the ASLG-PC12 corpus using the sequence-to-sequence model show that the GRU model with Bahdanau attention achieves the best results, scoring 94.37 % on ROUGE score (Recall-Oriented Understudy for Gisting Evaluation) and 83.98% on BLEU-4 score (Bilingual Evaluation Understudy). The GRU model with Bahdanau attention also gives the best results when translating from gloss to text, with ROUGE scores of 87.31% and BLEU-4 scores of 66.59%. The results on the ASLG-PC12 corpus using transformer model, when translating from text to gloss, reveal that the model with two layers gives the best result with ROUGE score 98.78% and BLEU-4 score 96.89%. When translating from gloss to text, the results also show that the model with two layers achieves the best result with ROUGE score 96.90% and BLEU-4 84.82%. Results of text-togloss translation on the Phoenix-2014T corpus show that the GRU model with Bahdanau attention achieves the best results in ROUGE and BLEU4 with scores of 42.96% and 10.53%, respectively. When translating from gloss to text, The GRU model with Luong attention achieves the best result in ROUGE with a score of 45.69% and BLEU4 with a score of 19.56%. The results on the Phoenix-2014T corpus using transformer model, when translating from text to gloss, reveal that the model with two layers gives the best result with ROUGE score 48.80% and BLEU-4 score 15.78%. When translating from gloss to text, the results also show that the model with two layers achieves the best result with ROUGE score 49.33% and BLEU-4 25.29%. |