Author: Abd El-Wahab, Seham Moawed./ Title: A new large-scale schema matching approach /

Search In this Thesis

العنوان

A new large-scale schema matching approach /

المؤلف

Abd El-Wahab, Seham Moawed.

هيئة الاعداد

باحث / سهام معوض عبدالوهاب

مشرف / علي إبراهيم الدسوقي

مشرف / أماني محمد سرحان

مشرف / سالي محمد الغمراوي

الموضوع

Matching approach.

تاريخ النشر

2024.

عدد الصفحات

237 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

هندسة النظم والتحكم

تاريخ الإجازة

1/1/2024

مكان الإجازة

جامعة المنصورة - كلية الهندسة - قسم هندسة الحاسبات ونظم التحكم

الفهرس

Only 14 pages are availabe for public view

from

237

from

237

Abstract

The World Wide Web Consortium (W3C) Semantic Web Activity is an ongoing endeavor to facilitate integrating and exchanging information resources among various applications and parties. Unfortunately, WEB is faced with metadata models that contain massive amounts of data in different formats. The challenge of handling heterogeneity among various metadata models is becoming more widespread. The problem has been faced inevitably, necessitating the need for metadata model matching to discover correspondences between entities in semantically relevant metadata models. Many application fields rely on the matching operation, including the semantic web, data warehouse, ontology integration, e-commerce, sensor networks, peer-to-peer systems, semantic web services, and social networks. Automatic matchers have become an unavoidable solution for facilitating semantic integration, reuse, and compatibility among such metadata models. However, in the large-scale scene, current metadata model matching systems suffer from memory consumption and a lack of scalability. Partitioning and parallelized strategies have been developed to alleviate spatial and temporal complications.Clustering-based matching is an essential step towards reducing the search space and thus increasing efficiency. However, methods for discovering related clusters are based on literally matching terms. To alleviate the situation, a novel approach is suggested that uses Latent Semantic Indexing to obtain the conceptual meaning between clusters. The experimental assessments indicate promising results toward developing efficient large-scale matching systems. Despite the achievements in the metadata model matching area, a compromise between match quality and match performance is required. The problem has been solved by proposing a fast and accurate matching framework called MetMat that is generic and able to identify mappings across XML schemas and/or ontologies. The suggested framework, in particular, is based on a parallelized clustering-based matching approach that first divides the original matching problem into smaller independent tasks. These independent tasks are then performed in parallel using desktop platform features equipped with parallelism-enabled multi-core processors. three distinct parallel strategies have been introduced to achieve that goal: inter-, intra-, and hybrid-matching. A set of matchers is used to obtain high quality. Through the experiments, the proposed framework can handle various sizes of data sets. Additionally, the MetMat framework was compared to the top matching tools in the OAEI (Ontology Alignment Evaluation Initiative)http://oaei.ontologymatching.org/), a worldwide initiative to evaluate metadata model matching systems systematically. The results show that, while maintaining the same quality, the MetMat framework with the intra-parallel matching strategy outperforms alternative matching techniques in terms of matching speed. Moreover, the tool had a good position through OAEI.The evolution of high-performance matching methodologies in terms of execution time dedicated to large-scale metadata models is a crucial challenge. To this end, three parallel matching strategies have been proposed: NPM, WSM, and HQM (Normal Parallel Matching, Warp Shuffle Matching, and Hyper-Q Matching). These matchers parallelize dynamic approximate string matching and depend heavily on the latest hardware improvements, including two essential innovations introduced with GPU, Hyper-Q, and the shuffle instructions. Moreover, HQM has been implemented by exploiting the memory structure of GPU to optimize its performance. The validity of our claims has been substantiated through an extensive set of experiments using different workloads of metadata models on a CUDA-enabled GPU of NVIDIA GeForce GTX 860M. The results demonstrate that the proposed algorithms attain a speedup factor ranging from 4.26X-80.52X, as fast as their traditional counterpart. Furthermore, the framework with HQM surpasses other matching strategies in terms of processing time while preserving the same quality.