Search In this Thesis
   Search In this Thesis  
العنوان
Efficient Techniques for Big Data Processing /
المؤلف
Elemam, Tarneem Elghareeb Mohamed Elsayed.
هيئة الاعداد
باحث / Tarneem Elghareeb Mohamed Elsayed Elemam
مشرف / Mohamed Elshrkawey
مشرف / Hamed Nassar
مشرف / Mohamed Elshrkawey
الموضوع
Computers and Informatics.
تاريخ النشر
2016.
عدد الصفحات
173p. - ;
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
اللغة واللسانيات
الناشر
تاريخ الإجازة
9/8/2016
مكان الإجازة
جامعة قناة السويس - كلية الاداب - اللغة الانجليزية
الفهرس
Only 14 pages are availabe for public view

from 449

from 449

Abstract

Big Data refers to a collection of large volumes of data that can’t be efficiently
processed using traditional data processing techniques and platforms [1]. Big data can
be fundamentally categorized into two groups [2]:
1) Real-world data – which can be obtained through sensors, scientific
investigation and observations.
2) Data from human society – which is usually obtained from social networking
sites, the Internet, transportation, telecommunication, health and other sources.
1.1.1 Big Data characteristics
The uniqueness of big data is characterized by 5 V’s. These include volume, velocity,
variety, veracity and value as shown in Figure 1.1.
Figure 1.1: Big Data characteristics
Volume
Velocity
Variety
Veracity
Value
Chapter 1. Introduction 2
• Volume means the amount of the generated data [3]. Digital and technological
era has produced massive data in recent years. It is estimated that the amount
of data generated globally in 2020 was about 44 zettabytes (ZB) [4]. The
amount of data the world produces daily continues to rise and by 2025, it is
estimated to reach 463 exabytes (EB) per day, or 175 ZB in total [5].
• Velocity stands for rapid generation of data and this requires real-time
processing [6].
• Variety means various types of data, including structured, unstructured, and
semi-structured data. Data formats can be in text, image, audio, video, web
documents and biological data [7].
• Veracity refers to the quality, truthiness and accuracy of the data obtained for
the purpose of analysis and decision making [8].
• Value deals with the practice of extracting the usefulness of the data. It is well
known that data in its original self won’t be valuable at all. Under analysis,
how data is transformed into information and knowledge is what the ”value”
characteristic deals with [9].
1.1.2 Big Data Challenges
The sharply increasing data deluge in the big data era brings many challenges. Figure
1.2 shows the challenges of big data [10].
Figure 1.2: Big Data Challenges.
• Data Capture Big Data can be collected from several sources. Due to the
variety of disparate data sources and the sheer volume, it is difficult to collect
and integrate data with scalability from distributed locations. Other problems
Big Data
Challenges
 Data Capture
 Data Storage
 Data Analysis
 Data Search
 Data Sharing
 Data Visualization
Chapter 1. Introduction 3
related to this challenge are data transmission, automatic generation of
metadata, and data pre-processing.
• Data Storage Big Data not only requires a huge amount of storage, but also
demands new data management on large distributed systems because
conventional database systems have difficulty to manage Big Data.
• Data Analysis Timely and cost-effective analytics over Big Data is now a key
ingredient for success in many businesses, scientific and engineering
disciplines, and government endeavors. Some problems related to this
challenge are performance optimization of software frameworks for Big Data
analytics, and scalable and distributed data management ecosystem for deep
analytics (i.e. the implementation of various data mining, machine learning
and statistical algorithms for analysis). In our work, we overcome this
challenge.
• Data Search Since data are to be used to make accurate decisions in time, it
becomes necessary that it should be available in accurate, complete and timely
manner, so query optimization is crucial.
• Data Sharing Sharing data is now as important as producing it. Professionals
will continue to produce and consume information that is specific to their own
business needs, but it is now generated in a way that can be connected and
shared to other aspects of an enterprise. Researches in several scientific
disciplines, such as ecology, medicine and biology, are facing issues regarding
data preservation and sharing. Some of these issues are data curation and
privacy preservation.
• Data Visualization Visual analytics is an emerging field in which massive
datasets are presented to users in visually compelling ways with the hope that
users will be able to discover interesting relationships. Visual analytics
requires generating many visualizations across many datasets.
1.1.3 Big Data Analytics
More data