Author: Hassan, Fawzya Ramadan Sayed./ Title: An Efficient Advanced SQL–to-MapReduce Translator to Improve Big Data Analysis on the Cloud Computing Environment /

Search In this Thesis

العنوان

An Efficient Advanced SQL–to-MapReduce Translator to Improve Big Data Analysis on the Cloud Computing Environment /

المؤلف

Hassan, Fawzya Ramadan Sayed.

هيئة الاعداد

باحث / فوزية رمضان سيد حسن

مشرف / ابراهيم فراج

مناقش / محمد خفاجى

مناقش / فاطمة عبدالستار عمارة

الموضوع

Big data.

تاريخ النشر

2015.

عدد الصفحات

94 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

23/8/2015

مكان الإجازة

جامعة القاهرة - كلية الحاسبات و المعلومات - Department of Computer Science.

الفهرس

Only 14 pages are availabe for public view

from

Abstract

MapReduce has become an effective framework for processing and analysing huge data size in large systems. On the other hand, SQL Query is necessary to build an efficient and flexible SQL translator to MapReduce framework. The need of optimized SQL translator that can deal with advanced queries is very necessary, which can increase the performance of data analysis with growing of Big Data. Hive supports queries which called HiveQL. HiveQL offers the same features as SQL, which still difficult to deal with complex SQL queries. Consequently, manual translation of HiveQL often leads to poor performance.
Also, Flink has become an effective framework to Big Data analysis in large cluster systems. On the other hand, FLink doesn’t support any Query language. So, the designing and implementing SQL to FLink Translator is needed to execute SQL Query over FLink. The work in this thesis adopts these limitations of SQL translators and proposes two contributions which considered as SQL–to-MapReduce translators to improve Big Data analysis.
The first contribution is called QRMapper (Query Rewriting Mapper). It is developed to solve the problem of translating a complex SQL queries into HiveQL by utilizing and optimizing Query rewriting. This translator improves the performance of HiveQL without any changes in Hive framework and provides the possibility of executing SubQuery and Advanced SQL Query. Our system performance has been evaluated using TPC-H Benchmark.
The second contribution is called SQL to Flink Translator. A new system has been developed to define and add SQL Query language to Flink. This translator improves the performance of SQL without any change in Flink framework and provides the possibility of execute SQL Query on Flink by generating Flink algorithm that executes SQL Queries. Also, SQL TO Flink Translator has the capability of execute SQL with high degree of performance, when other systems have a low-performance . Our system performance has been evaluated using TPC-H Benchmark.
Generally, according to these two contributions, a new layer has been developed to execute advanced SQL Query over MapReduce translator. So, it is considered a main contribution in the Big Data field.