Author: Omar, Hossam Omar Ahmed./ Title: Hardware Accelerator for Robotics and Autonomous Systems (RAS) /

Search In this Thesis

العنوان

Hardware Accelerator for Robotics and Autonomous Systems (RAS) /

المؤلف

Omar, Hossam Omar Ahmed.

هيئة الاعداد

باحث / Hossam Omar Ahmed Omar

مشرف / Mohamed Amin Dessouky

مشرف / Maged Ghoneima

مناقش / Khaled Ali Shehata

تاريخ النشر

2019.

عدد الصفحات

159 P. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

1/1/2019

مكان الإجازة

جامعة عين شمس - كلية الهندسة - قسم الالكترونيات والاتصالات الكهربية

الفهرس

Only 14 pages are availabe for public view

from

159

from

159

Abstract

The swift growth of data size and accessibility in recent years has initiated a shift of philosophy in algorithm designs for artificial intelligence and machine learning, since the ability to learn modern systems and applications automatically from massive amounts of data depending on the conventional algorithms has led to ground-breaking performance in important domains such as natural language processing, Robotics and Autonomous Systems (RAS), speech recognition, and computer vision. Nowadays, the most popular class of techniques used in these domains is called deep learning and is seeing important attention from industry. However, these models require extraordinary massive amounts of data and compute power to train and are limited by the need for better hardware acceleration to be appropriate for scaling beyond current data and model sizes.
While the present hardware acceleration solution has been to use clusters of graphics processing units (GPU) as general purpose processors (GPGPU), the use of field programmable gate arrays (FPGA) or Application Specific Integrated Circuit (ASIC) provide interesting alternatives, since FPGA and ASIC architectures are flexible which give them the ability to explore model-level optimizations beyond what is possible on fixed architectures such as GPUs and CPU hardware based solutions. As well, FPGAs and ASICs tend to provide high performance per watt of power consumption, which is very remarkable for developing large scale server-based deployment or resource-limited embedded applications. Without a doubt, many artificial intelligence and machine learning algorithms are biologically inspired algorithms which mainly depend on dense concurrent computational processing which lead to rely on using FPGA and ASIC as the ideal Platforms for their capabilities to perform Data parallelism, Model parallelism, and Pipeline Parallelism.
In this thesis, we proved that by changing the conventional embedded multiplier blocks and the DSP blocks of the silicon fabric architecture of the FPGA chips we will boost the acceleration capabilities of the FPGA to process any generic Deep Neural Network (DNN) systems while having the ability to have more number of such accelerators due to the optimization level that has been adopted. The four proposed units proved their abilities to exceed the state-of-the-art conventional accelerators by Xilinx and Intel Altera vendors in the computational performance as will be explained in the following chapters. These results have been achieved under two main constrains. The first constrain was the shallow information about the detailed performance features of state-of -the-art DSP block by these manufacturers; and the second constrains is the unavailability of having a high-technology files to perform a deep testing to the proposed units in order to examine their real performance capabilities. The low power consumption of the proposed units is a clear indication of their promising future to be suitable for critical applications such as the Robotics and Autonomous Systems (RAS).

Key words: Deep learning, convolutional neural networks, computational intelligence, Multiply Array Grid, Multiply Parallel Adder, Pyramidal Neuron Accelerator Architectures (PNAA).