الفهرس | Only 14 pages are availabe for public view |
Abstract Converting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. In this thesis, we propose a transformation approach to convert a widely recognized Arabic Treebank into CCGbank representation in order to gain those benefits, The algorithm could successfully transform the Penn Arabic Treebank (PA TB) into the first complete Arabic CCGbank (ACCGbank), Our proposed algorithm performs the transformation through four steps, starting with a preprocessing step which was enforced by characteristics and peculiarities of Arabic language. This was required for normalizing the PATB and making it suitable and accurate for the conversion from PATB to CCGbank. The second step is determining the types of each node in the PATB tree structure. Afterwards, the PATB’s flat tree structures are transformed into binary trees using binarization techniques. Finally, CCG trees are formed using binary tree structure while augmenting the extracted information during earlier steps to produce CCG-tags. We conducted an experiment on several parts P A TB aiming at converting the PATB into the ACCGbank. Our algorithm averaged 97.96% conversion rate throughout the PATB parts. Moreover, the resulting CCG- tags lexicon was four times larger than the PATB lexicon. Keywords:Combinatory CategoriaI Grammar, Machine Translation, Arabic CCGbank, Penn Arabic Treebank. |