Author: Elsheikh, Ghada Mahmoud./ Title: Service oriented data integration based on mapreduce /

Search In this Thesis

العنوان

Service oriented data integration based on mapreduce /

المؤلف

Elsheikh, Ghada Mahmoud.

هيئة الاعداد

مشرف / غادة محمود الشيخ

مشرف / محمد سعيد ابو جبل

مشرف / صالح الشهابي

مشرف / مصطفي يسري النعناعي

مناقش / مجدي حسين ناجي

مناقش / عادل عبد المنعم الزغبي

الموضوع

Data Structures.

تاريخ النشر

2012.

عدد الصفحات

81 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

هندسة النظم والتحكم

تاريخ الإجازة

1/7/2012

مكان الإجازة

جامعة الاسكندريه - كلية الهندسة - حاسب آلي

الفهرس

Only 14 pages are availabe for public view

from

100

from

100

Abstract

Data integration became a backbone for many essential and widely used services;
these services depend on integrating data from multiple sources in a fast and efficient
way to be able to provide the accepted level of service it is committed to. As the size
of data available on different environments became very huge, and systems are
heterogeneous and autonomous, data integration became a crucial part of most
modem systems.
Data integration is defined as the process of combining data from heterogeneous
sources so it can be used as one unified source. What data integration definition does
not enforce is the way how integration takes place. Therefore, the applied technique is
the user’s choice and it is derived from hosting environments and systems’ needs.
According to the way integration takes place, there are two common techniques for
data integration; the Virtual View approach and the Materialized View approach. In
the Virtual View approach data is accessed from local source on-demand (e.g. data
federation), while in the Materialized View approach data is extracted in advance,
translated, filtered, and may be merged with relevant data from other sources, then
stored in (logically) centralized repository(e.g. data warehouses). With the rapidly
changing requirements of business, users and environments, Materialized View
maintenance and modifications’ costs became very high, Thus, the Virtual View
approach became a good candidate in these conditions.
Furthermore the emergence of Web services developments and standards in support
of automated business integration has driven major technological advances in the
integration software space, most notably the Service-Oriented Architecture (SOA).
Some data integration systems adopted the Service Oriented (SO) model and proved
better and more organized design.
Another affecting factor in the modem systems is shifting from adopting the idea of
increasing the resource amount (scale-up) to solve large scale problems to the idea of
tying together many low-end/commodity machines (scale-out) together as a single
functional distributed system, which gives the illusion of having endless resources as
provisioned by modem infrastructures like Grids and Clouds. This implies adopting
new processing model to benefit from these infrastructures as proposed by
MapReduce distributed processing model.
As a result of all these variables, this study brings together the data integration
system, Service Orientation and distributed processing to provide a mixture that
improves performance especially with large number of data sources and can
efficiently being hosted on modem infrastructures as Clouds.
Therefore, Service Oriented Data Integration based on MapReduce (SODIM) system
is proposed in this thesis to benefit from the emerging distributed processing model
(MapReduce), and the loose-coupling provided by Service Orientation and web-
services, to provide more extendibility, agility, reliability, and fault tolerance.
The thesis provides a detailed description of how the techniques were brought
together to eliminate current systems restrictions and provide more enhancements. An
implementation is provided as a proof of concept and a case study is introduced as an
evaluation method.