Search In this Thesis
   Search In this Thesis  
العنوان
Acceleration of Artificial Neural Networks Using a Hardware Platform/
المؤلف
El-Sokkary, Salma Khaled Ali.
هيئة الاعداد
باحث / سلمى خالد على السكرى
مشرف / محمد محمود أحمد طاهر
مشرف / حسنين حامد عامر
مشرف / شريف رمزي سلامة أندراوس قزمان سلامة
تاريخ النشر
2023.
عدد الصفحات
159 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الهندسة الكهربائية والالكترونية
تاريخ الإجازة
1/1/2023
مكان الإجازة
جامعة عين شمس - كلية الهندسة - هندسة الحاسبات والنظم
الفهرس
Only 14 pages are availabe for public view

from 159

from 159

Abstract

Three main design scenarios are introduced for the goal of flexibly accelerating a CNN on a device that contains both an FPGA fabric and an ARM processor. As the device gets bigger in the FPGA fabric, so does the acceleration. Through accelerating the CNN on the layers’ level, smaller devices can gain a considerable acceleration too. By implementing the CNN on FPGA in a way that has a huge parallelism. Thanks to both reasons: The convolutional operation is a local operation that does not need the full input to operate and due to the parallel nature of the CNN. We were able to divide the parallel computations in two main methods. Either by dividing the input image or by dividing the filter’s channels between the two platforms. Different methods give different results but all lead to the acceleration goal.
The thesis is organized as follows:
Chapter 1 introduces the research in this thesis. It demonstrates the scope of the work and why it is done, it also shows the contribution of the thesis and how the thesis is organized.
Chapter 2 contains the research background as the literature review and the CNN layers and operations explanation. It shows the research done on the topic of accelerating CNN on FPGA only (single platform), FPGA fabric and ARM processor (dual platform) and multi platforms like multiple devices connected together. Operations like single-input multi-output convolution, multi-input multi-output convolution, pooling, activation functions, padding, are the basic operations needed throughout the thesis and are explained in this chapter too.
Chapter 3 explains the classification problem and describes the implementation of the full model both on ARM and FPGA developed for the purpose of this work. It explains the complete design of the CNN on FPGA, the functionality of each inner module instance, and how these instances are connected to each other, showing the flow of the operations on both, layer levels and top level. It also explains the classification problem and the pruning made to the original CNN.
Chapter 4 introduces the first design scenario (Image Division Scenario) with its different combinations. In this Chapter, the basic idea of the image division scenario is mapped on both the layers and the platforms, giving four different combinations to obtain different resource utilizations and accelerations.
Chapter 5 introduces the second and third design scenarios (Filter Division Scenarios) with their combinations. The Chapter explains how both scenarios are similar and it demonstrates the differences between both through figures and explanation. It also proposes four combinations for each scenario, giving a total of twelve combinations for the thesis proposed methods.
Chapter 6 explains the implementation environments, the Floating Point Cores (FloPoCo) virtual machine, the Linux operating system on ARM and the software used to synthesize, implement, and simulate the code. It also demonstrates the results. Both, acceleration results and how to calculate and measure them and the resource utilization results on the modular level for each combination of the twelve combinations. And for the full design too. It also shows the software results of acceleration on the ARM processor.
Chapter 7 concludes all the work done and highlights the futuristic directions of this thesis.