Author: Al-Qutt, Mirvat Mahmoud Ahmed./ Title: Solving Computationally Intensive Problem Using GPUs- Solving Motif Finding Problem as a Case Study /

Search In this Thesis

العنوان

Solving Computationally Intensive Problem Using GPUs- Solving Motif Finding Problem as a Case Study /

المؤلف

Al-Qutt, Mirvat Mahmoud Ahmed.

هيئة الاعداد

باحث / ميرفت محمود احمد القط

مشرف / حسام الدين مصطفى فهيم

مشرف / رانية عبد الرحمن الجوهري

مشرف / هبه خالد احمد

تاريخ النشر

2018.

عدد الصفحات

129 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

1/1/2018

مكان الإجازة

جامعة عين شمس - كلية الحاسبات والمعلومات - نظم الحاسبات

الفهرس

Only 14 pages are availabe for public view

from

129

from

129

Abstract

With scientific problems getting larger and more complicated, level of parallelization needed to be increased to provide the needed amount of computational power, so more adequate, efficient and robust parallel environments became an essential need for scientific research. This thesis proposes hybrid parallel paradigms to solve computationally intensive problem, taking into consideration one of the most common problems in Bioinformatics, Motif Finding as a case study. Motifs are defined as the short patterns these consist of nucleotide; they are usually positioned close to the genes contained in Deoxyribo Nucleic Acid (DNA). They occur in a frequent manner within the sequence, these patterns are not identical but it comes with some mutations in several of their nucleotide positions. Motif Finding Problem aims to discover unknown motifs that are expected to be common in a set of sequences. Generally it can be viewed as a large-length sequence matching problem.
There are a numerous number of algorithms available to solve this problem; these can either be exact or approximate. Motif finding problem is categorized as one of the most computationally intensive problems in the field of bioinformatics and it requires a large amount of memory. It has been categorized as Nondeterministic Polynomial Time order Problem. Both software and hardware accelerators have been proposed and deployed to accelerate Motif finding problem algorithms, Software based acceleration solutions are easier to implement and do not require hardware experience.
This research is kicked-off by exploring the existing algorithms and parallelization techniques for accelerating Motif Finding problem and identifying the main points that could represent a contribution.
In this research, SKIP brute-force algorithm is accelerated using different High Performance Computing implementations. The results showed that GPU significantly reduced the execution time and improved the performance better than using Multicore architecture. In Addition, it has been noticed that different sequence lengths affect the speedup and power consumption. These implementations are considered a step towards building a complete parallel architecture for solving computationally intensive problems of bioinformatics.
Consequently, machine learning techniques have been exploited to develop an intelligent recommender system that advices the optimal HPC architecture for specific Motif finding problem size and other parameters. A neural network–based multi-objective optimization approach is employed (Neural Network Inversion), which is used for direct problem approximation by mean of a neural network. The objective functions are maximizing the speedup ratio and minimizing the power consumption. The importance of this system is clear as it employs an automatic decision regarding optimal number of processors in terms the optimal hardware configuration that can efficiently solve computationally intensive problems taking into consideration the resources availability and a set of requirements and environment restrictions that might be conflicting sometimes. The proposed system achieved prediction accuracy reached over 89 % with “390” iterations on average for CPU based hardware configuration prediction system, and 87% with “306” iterations on average for GPU based hardware configuration prediction system.
Finally, considering the solution scalability, cloud platform services have been explored for storage and analytics purposes, taking into consideration the huge volume of data, tremendous cloud storage platforms that currently exist and provide a scalable, distributed storage service. Therefore, the challenges of implementing Motif Finding based on employing the services provided by the cloud storage platform are addressed. In this research, Apache Hadoop is picked for Big Data processing which is an open source Platform that provides enormous number of clusters that is used for parallel implementations. MF solution was implemented using two different Big Data frameworks: MapReduce and Apache Spark, they have different implementation schemes and resources usage plans. The results are collected and analyzed against different parameters such as the speedup value and power consumption. from experiments Spark achieves much more speedup than MapReduce. This finding is expected due to MapReduce Input/Output operations overheads. The main purpose of this step is to accomplish effective parallelization and evaluates the performance of such an integrated solution.