Search In this Thesis
   Search In this Thesis  
العنوان
Ontology based System for
Converting Semi Structured Data into
Relational Data /
المؤلف
Awad, Arwa Abd Elrahman Abd Elslam.
هيئة الاعداد
باحث / أروي عبد الرحمن عبد السلام عوض
مشرف / محمد اسماعيل رشدي
مشرف / إبراهيم فتحي معوض
مشرف / رانيه عبد الرحمن الجوهري
تاريخ النشر
2022.
عدد الصفحات
90 P. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
1/1/2022
مكان الإجازة
جامعة عين شمس - كلية الحاسبات والمعلومات - قسم نظم المعلومات
الفهرس
Only 14 pages are availabe for public view

from 90

from 90

Abstract

Spreadsheets are contained critical information on various topics and are most broadly utilized in numerous fields. There are a huge amount of spreadsheets clients around the world as it considered the standard documentation format when dealing with data in a tabular format as a result of their convenience, support for diagrams, graphs and gives their users an enormous level of opportunity in encoding their data as it is simple to utilize.
A spreadsheet is designed to work similarly to a database as it has a cell-like structure with a cell being a member of a horizontal row and vertical column. While compared to a database, spreadsheets lack many features that make them less appealing for use in data storage and processing. Spreadsheets suffer from low quality because of duplication or data redundancy may be found, where multiple copies of the same data exist in the same spreadsheet document. In addition, spreadsheets do not have the capacity to provide multiuser access like a database and it also has limited storage capabilities.
Spreadsheet tables with semi-structured form are a type of nearly relational data that shares the important qualities of relational data but does not present itself in a relational format. It often conveys highly valuable information and is widely used in many different areas. If we can convert such data into the relational form, many existing tools can be leveraged for a variety of interesting applications, such as data analysis with relational query systems and data integration applications.
In addition, the methodology wherein information is stored in Excel spreadsheets is not the best way to deal with sorting out and getting to it. As the result of utilizing a spreadsheet is in developing exponentially in the most recent years and the increments in volume and unpredictability of this information have prompted expanded prerequisites to save. By converting a spreadsheet document into a relational format or into a database table, the user can benefit from the advantages of a database on their existing data.
The thesis aims to automate the conversion of the spreadsheet tables from semi-structured format with low quality into high-quality relation form based on ontology technique. We have developed a system that automates converts the semi-structured data (tables) in spreadsheets into relational data without user previous experience in any programming language and converts from Low-Data Quality (LDQ) to High-Data Quality (HDQ). The proposed approach used novel algorithms based on a clustering approach, cell classification strategy, and heuristic rules for table detection and extraction from a spreadsheet. Finally, the tests show that the methodology builds the adaptability of information integration systems. In addition, experiments result achieved high accuracy in extracting relational data from spreadsheets with a percentage of 97.5% and 82.4% in simple and hierarchal data respectively, besides a 100% percent of successfully extracted duplicated records