Search In this Thesis
   Search In this Thesis  
العنوان
Developing Semantic-based System for Arabic Information Retrieval/
المؤلف
Alromima, Wasim Ahmed Abdul-Aziz.
هيئة الاعداد
باحث / Wasim Ahmed Abdul-Aziz Alromima
مشرف / Mostafa Mohammed Aref
مشرف / Ibrahim Fathy Moawad
مشرف / Rania Abdul-Rahman El-gohary
تاريخ النشر
2016.
عدد الصفحات
135 p. ;
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Information Systems
تاريخ الإجازة
1/1/2016
مكان الإجازة
اتحاد مكتبات الجامعات المصرية - نظم المعلومات
الفهرس
Only 14 pages are availabe for public view

from 135

from 135

Abstract

In the era of information overload, Information Retrieval Systems are vital applications. Nowadays, the World Wide Web and the social media has become a vast library of unstructured data, which is laboriously comprehended and processed without using intelligent techniques. Many researchers are endeavoring to enhance search results in terms of precision and recall by developing new methods, especially in semantics. The amount of available Arabic content is increasing, but this is of low usefulness due to the complexity of the Arabic language morphology and the lack of resources like ontologies and machine-readable dictionaries.
The main objective of this thesis is to introduce a new Semantic-based Arabic Information Retrieval System (SAIRS) to improve Arabic text retrieval. Due to the complexity aspect and limited resources of the Arabic language, the proposed approach has three main contributions. First, the query is expanded using n-gram term collocations, which are automatically mined from the Arabic corpus; therefore there is no need for external semantic resource. Second, the query is expanded using Arabic domain ontology, which was designed and represented manually by the Web Ontology Language (OWL). Third, the system index is constructed using the corpus words, and hence the cost and effort of the stemming process are saved. The Vector Space Model (VSM) has been employed to represent both documents and user queries. The experimental evaluation has been conducted on the scripts of the Arabic Holy Quran.
The main two sub-objectives for this thesis are: first, extracts tagged n-gram collocations (from 2- 6 gram) from the Arabic corpus is presented, which extracts words collocations by matching input structured pattern of the Arabic language versus the Part of Speech Tagging (POST) for the Arabic corpus. The system is useful for extracting different kinds of sequences of words and phrases. The prototype is beneficial for linguistic research as shown in different scenarios for the experiments conducted.
The second sub-objective, the Arabic domain ontology is built, which is designed and represented manually by Web Ontology Language (OWL). Ontology defines the terms with specified relationships between them and can be interpreted by both humans and computers. In general, there are scare semantic resources for Arabic language especially in Arabic ontologies. In recent years, many researchers interested in building Arabic sematic resources, which are then can be exploited by others to build Arabic sematic applications such as the ontological “Time Nouns” vocabulary in the Holly Quran. Therefore, to share towards building an integrated and unified ontology for Arabic language, the ontology-based model associated with “Place Nouns” vocabulary in the Holy Quran is presented. In additions, the ontology will be useful in the knowledge for Islamic learning, linguistics researches, and Semantic Web applications. In conclusion there are a lot of researches presents the ontology in specific domain for the Quran and tried to develop an upper ontology, but it is not complete and did not use semantic technology.
We used the precision, recall, and mean average precision measures to determine the accuracy of the proposed approaches. The evaluation results demonstrated that both the proposed system and the stem-based method retrieve near relevant documents, but the proposed system outperforms the stem-based method in accuracy, where the mean average precision of the proposed system and the stem-based method are 82.1% and 51.1% respectively.