Improved field programmable gatearraybased accelerator of deep neural networkusing opencl

Being compute-intensive and memory expensive, it is hard to deploy Deep Neural Network (DNN) based models into the embedded devices. Despite recent studies that have shown the efforts to explore the Field Programmable Gate Array (FPGA) as an alternative to deploy DNN-based models such as AlexNet and...

Description complète

Détails bibliographiques
Auteur principal: Yap, June Wai
Format: Thèse
Langue:anglais
anglais
Publié: 2022
Accès en ligne:http://eprints.utem.edu.my/id/eprint/26977/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122220
Abstract Abstract here
_version_ 1855619808411779072
author Yap, June Wai
author_facet Yap, June Wai
author_sort Yap, June Wai
description Being compute-intensive and memory expensive, it is hard to deploy Deep Neural Network (DNN) based models into the embedded devices. Despite recent studies that have shown the efforts to explore the Field Programmable Gate Array (FPGA) as an alternative to deploy DNN-based models such as AlexNet and VGG, there is still a lot of challenges to implement DNN-based object detection model on Field Programmable Gate Array (FPGA). Hence, in this research, the design of a scalable parameterised DNN-based object detection model: Tiny YOLOv2 targeting on FPGA: Cyclone V PCIE Development Kit using High-Level-Synthesis (HLS) tool is explored. Considering the hardware resource limitations in term of computational resources and memory bandwidth, data quantization is proposed to convert the floating point (32-bit) of Tiny YOLOv2 into fixed-point (8-bit) design. To achieve the good performance, an in-depth analysis on the computation complexity and memory footprint of the Tiny YOLOv2 is also studied to find the best quantization scheme for Tiny YOLOv2. The proposed quantization scheme improves the memory requirements to store the parameter from 60 MB to 15 MB, which is around ×4 times improvement compared to the original floating-point design. Finally, the proposed implementation achieves a peak performance density of 0.29 Giga-Operation Per Second (GOPS)/Digital Signal Processing Block (DSP) with only 0.4% loss in the accuracy, which the performance is comparable to all other previous works.
format Thesis
id utem-26977
institution Universiti Teknikal Malaysia Melaka
language English
English
publishDate 2022
record_format EPrints
record_pdf Restricted
spelling utem-269772024-01-16T14:28:16Z http://eprints.utem.edu.my/id/eprint/26977/ Improved field programmable gatearraybased accelerator of deep neural networkusing opencl Yap, June Wai Being compute-intensive and memory expensive, it is hard to deploy Deep Neural Network (DNN) based models into the embedded devices. Despite recent studies that have shown the efforts to explore the Field Programmable Gate Array (FPGA) as an alternative to deploy DNN-based models such as AlexNet and VGG, there is still a lot of challenges to implement DNN-based object detection model on Field Programmable Gate Array (FPGA). Hence, in this research, the design of a scalable parameterised DNN-based object detection model: Tiny YOLOv2 targeting on FPGA: Cyclone V PCIE Development Kit using High-Level-Synthesis (HLS) tool is explored. Considering the hardware resource limitations in term of computational resources and memory bandwidth, data quantization is proposed to convert the floating point (32-bit) of Tiny YOLOv2 into fixed-point (8-bit) design. To achieve the good performance, an in-depth analysis on the computation complexity and memory footprint of the Tiny YOLOv2 is also studied to find the best quantization scheme for Tiny YOLOv2. The proposed quantization scheme improves the memory requirements to store the parameter from 60 MB to 15 MB, which is around ×4 times improvement compared to the original floating-point design. Finally, the proposed implementation achieves a peak performance density of 0.29 Giga-Operation Per Second (GOPS)/Digital Signal Processing Block (DSP) with only 0.4% loss in the accuracy, which the performance is comparable to all other previous works. 2022 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/26977/1/Improved%20field%20programmable%20gatearraybased%20accelerator%20of%20deep%20neural%20networkusing%20opencl.pdf text en http://eprints.utem.edu.my/id/eprint/26977/2/Improved%20field%20programmable%20gatearraybased%20accelerator%20of%20deep%20neural%20networkusing%20opencl.pdf Yap, June Wai (2022) Improved field programmable gatearraybased accelerator of deep neural networkusing opencl. Masters thesis, Universiti Teknikal Malaysia Melaka. https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122220
spellingShingle Yap, June Wai
Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
thesis_level Master
title Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_full Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_fullStr Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_full_unstemmed Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_short Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_sort improved field programmable gatearraybased accelerator of deep neural networkusing opencl
url http://eprints.utem.edu.my/id/eprint/26977/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122220
work_keys_str_mv AT yapjunewai improvedfieldprogrammablegatearraybasedacceleratorofdeepneuralnetworkusingopencl