الفهرس | Only 14 pages are availabe for public view |
Abstract The thesis treats this topic in four chapters, in addition to a conclusions chapter and a list of references. Chapter one gives an overview on the scope of the thesis, previous work, problem definition, motivation, objectives, and the thesis outline. The chapter justifies the need to use the GPUs. Also it justifies the reasons behind selecting algorithms to fit on the GPU and the parallel paradigms that are used in the proposed implementation. The previous work section includes different attempts to parallelize the face recognition and detection task on different hardware platforms like FPGA and GPUs and it emphasizes the advantages and disadvantages of each implementation. Chapter two introduces the face recognition fundamental techniques. It discusses well know machine learning techniques like Hidden Markov Model (HMM) that tries to build a statistical model of transit states to define the probability of moving from one state to another trying to predict the next state. Support Vector Machine (SVM) as one of the well knows techniques in categorization algorithms which tries to find a plane that totally or partially separate two or more categories. Different SVM models is explained i.e. soft-margin and kernel function techniques that enhance the SVM algorithm greatly. It explains Eigen faces and the need to optimize the Eigen face calculations and the limitations. A full working example was introduced to explain the whole process to skip that in the implementation later on chapter three. The face detection algorithm for Viola-Jones is presented and the algorithm key enhancements are explained in details. Chapter three discuss the proposed software framework to achieve the maximum frame rate and keep the scalability as much as required. The acceleration of face detection phase using GPUs is discussed in details. The MPI messages that are exchanged among the different nodes and the overall performance analysis are presented. Chapter four presented four different use cases with different 4 GPU models. The use cases showed the performance of the proposed implementation on different GPUs and CPUs. The Mobile GPU NVidia M311 showed a good performance for a mobile GPU with 1.74 fps. The desktop GPUs NVidia GeForce GT240 and GeForce GTX 560 performance was 6.1 fps and 31.25 fps respectively. The performance on a cluster of four nodes with GPU NVidia 610 resulted in performance of 5.34 fps. The performance of the cluster has downgraded due to the use of low end GPU. Finally the power consumption metric has been calculated and the optimum power usage and the processing speed related to the power consumption has been calculated on the four different GPUs to find the most efficient GPU in terms of power. Chapter five gives conclusion remarks about the GPU implementation guidelines for any algorithm, and the research points that are potentially would achieve more performance on the GPU. |