Search In this Thesis
   Search In this Thesis  
العنوان
Handling Straggler Tasks in Hadoop /
المؤلف
Eid، Gehad Khaled Hussein.
هيئة الاعداد
باحث / جهاد خالد حسين عيد
مشرف / مصطفي ربيع كاسب
مشرف / محمد حسن ابراهيم
مناقش / مصطفي ربيع كاسب
الموضوع
qrmak
تاريخ النشر
2023
عدد الصفحات
67 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science Applications
تاريخ الإجازة
11/1/2023
مكان الإجازة
جامعة الفيوم - كلية الحاسبات والمعلومات - علوم الحاسب
الفهرس
Only 14 pages are availabe for public view

from 67

from 67

Abstract

Hadoop is the most economical and low-cost software framework for storing
and processing large data sets in a distributed manner. Using MapReduce, the
Hadoop distributed file system (HDFS) enables distributed storage and parallel
processing of large data sets. On the other hand, Hadoop’s current implementation
assumes that a cluster’s computational nodes are distributed work evenly. A task in
Hadoop called ”Straggling task (ST)” significantly impacts Hadoop.
The major causes of stragglers in heterogeneous Hadoop clusters are load
imbalance during storage, resource contention during task scheduling, hardware
degradation due to its excessive use, and software misconfiguration during cluster
management.
Artificial neural network (ANN) is a good technique to deal with ”Straggling
tasks” since it monitors the rate of running processes or reduces the time in real-time
to back up the ”Straggler” on another node to increase the opportunity of completing
the backup task ahead of the original.
This thises tackle dealing with the ”Straggling task ” by creating a strategy able
to deal with misjudgment, improper selection of backup nodes, and making
speculative tasks start from the checkpoint, by reducing the remaining time for
Wordcount, it was demonstrated to be capable of detecting straggler tasks and
properly estimating execution time. It also allows for job execution to be speed up.
The (ST-ANN) is a technique that consists of three basic parts: weight estimation
using a neural network, task execution time estimation, and information storage
repository for storing data from previously conducted tasks.