الفهرس | Only 14 pages are availabe for public view |
Abstract Big Data refers to a collection of large volumes of data that can’t be efficiently processed using traditional data processing techniques and platforms [1]. Big data can be fundamentally categorized into two groups [2]: 1) Real-world data – which can be obtained through sensors, scientific investigation and observations. 2) Data from human society – which is usually obtained from social networking sites, the Internet, transportation, telecommunication, health and other sources. 1.1.1 Big Data characteristics The uniqueness of big data is characterized by 5 V’s. These include volume, velocity, variety, veracity and value as shown in Figure 1.1. Figure 1.1: Big Data characteristics Volume Velocity Variety Veracity Value Chapter 1. Introduction 2 • Volume means the amount of the generated data [3]. Digital and technological era has produced massive data in recent years. It is estimated that the amount of data generated globally in 2020 was about 44 zettabytes (ZB) [4]. The amount of data the world produces daily continues to rise and by 2025, it is estimated to reach 463 exabytes (EB) per day, or 175 ZB in total [5]. • Velocity stands for rapid generation of data and this requires real-time processing [6]. • Variety means various types of data, including structured, unstructured, and semi-structured data. Data formats can be in text, image, audio, video, web documents and biological data [7]. • Veracity refers to the quality, truthiness and accuracy of the data obtained for the purpose of analysis and decision making [8]. • Value deals with the practice of extracting the usefulness of the data. It is well known that data in its original self won’t be valuable at all. Under analysis, how data is transformed into information and knowledge is what the ”value” characteristic deals with [9]. 1.1.2 Big Data Challenges The sharply increasing data deluge in the big data era brings many challenges. Figure 1.2 shows the challenges of big data [10]. Figure 1.2: Big Data Challenges. • Data Capture Big Data can be collected from several sources. Due to the variety of disparate data sources and the sheer volume, it is difficult to collect and integrate data with scalability from distributed locations. Other problems Big Data Challenges Data Capture Data Storage Data Analysis Data Search Data Sharing Data Visualization Chapter 1. Introduction 3 related to this challenge are data transmission, automatic generation of metadata, and data pre-processing. • Data Storage Big Data not only requires a huge amount of storage, but also demands new data management on large distributed systems because conventional database systems have difficulty to manage Big Data. • Data Analysis Timely and cost-effective analytics over Big Data is now a key ingredient for success in many businesses, scientific and engineering disciplines, and government endeavors. Some problems related to this challenge are performance optimization of software frameworks for Big Data analytics, and scalable and distributed data management ecosystem for deep analytics (i.e. the implementation of various data mining, machine learning and statistical algorithms for analysis). In our work, we overcome this challenge. • Data Search Since data are to be used to make accurate decisions in time, it becomes necessary that it should be available in accurate, complete and timely manner, so query optimization is crucial. • Data Sharing Sharing data is now as important as producing it. Professionals will continue to produce and consume information that is specific to their own business needs, but it is now generated in a way that can be connected and shared to other aspects of an enterprise. Researches in several scientific disciplines, such as ecology, medicine and biology, are facing issues regarding data preservation and sharing. Some of these issues are data curation and privacy preservation. • Data Visualization Visual analytics is an emerging field in which massive datasets are presented to users in visually compelling ways with the hope that users will be able to discover interesting relationships. Visual analytics requires generating many visualizations across many datasets. 1.1.3 Big Data Analytics More data |