الفهرس | Only 14 pages are availabe for public view |
Abstract Hadoop is the most economical and low-cost software framework for storing and processing large data sets in a distributed manner. Using MapReduce, the Hadoop distributed file system (HDFS) enables distributed storage and parallel processing of large data sets. On the other hand, Hadoop’s current implementation assumes that a cluster’s computational nodes are distributed work evenly. A task in Hadoop called ”Straggling task (ST)” significantly impacts Hadoop. The major causes of stragglers in heterogeneous Hadoop clusters are load imbalance during storage, resource contention during task scheduling, hardware degradation due to its excessive use, and software misconfiguration during cluster management. Artificial neural network (ANN) is a good technique to deal with ”Straggling tasks” since it monitors the rate of running processes or reduces the time in real-time to back up the ”Straggler” on another node to increase the opportunity of completing the backup task ahead of the original. This thises tackle dealing with the ”Straggling task ” by creating a strategy able to deal with misjudgment, improper selection of backup nodes, and making speculative tasks start from the checkpoint, by reducing the remaining time for Wordcount, it was demonstrated to be capable of detecting straggler tasks and properly estimating execution time. It also allows for job execution to be speed up. The (ST-ANN) is a technique that consists of three basic parts: weight estimation using a neural network, task execution time estimation, and information storage repository for storing data from previously conducted tasks. |