Research question 6: How far can an efficient implementation of CNN on either FPGA




Download 3,22 Mb.
Pdf ko'rish
bet11/81
Sana16.05.2024
Hajmi3,22 Mb.
#238917
1   ...   7   8   9   10   11   12   13   14   ...   81
Bog'liq
Alireza Fasih

Research question 6: How far can an efficient implementation of CNN on either FPGA 
or GPU be designed and implemented? 
CNN is a complex design in terms of implementation and performance on traditional 
architectures of the von Neumann type (such as CPU and sequential processors). 
Therefore, we are looking for an appropriate architecture compatible with the parallel 
nature of CNN. Traditional ANNs display some form of global connectivity; this results in a 
difficult implementation in digital hardware. In contrast CNN has local connectivity of its 
cells; this results in a great potential for implementation of this architecture on either 
FPGA or GPU. FPGA macro cells and logical components can operate in a highly parallel 
manner. Further, due to a very flexible routing between them we can implement any 
complex digital circuit scheme or model on FPGA. To get more advantages from using 
FPGA one should use high level behavioral modeling languages such as VHDL, Verilog or 
SystemC. FPGA has a local and embedded memory; this is very important for storing the 
CNN states. Otherwise the memory access processes between the FPGA and and an 
external memory could be very time consuming and constitute a big bottleneck. Today 
most of FPGAs chips on the market do contain an internal dedicated standard CPU that has 
access to the hardware and to the logical field of the FPGA through the standard bus 
controller [34, 35]. There are many high level compilers based on ANSI-C standard for 
coding and debugging. This technology increases the system performance through 
integration of hardware and software. Therefore, the loading of initial states of the CNN 
cells, of the templates values, and setting ‘time scales’ and other parameters can be
done/performed easily by the embedded CPU. The resources of FPGA are not endless and 
we have to consider this issue during designing the CNN architecture.
Another interesting platform for CNN is GPU, which is getting more popular every day. The 
highly parallel s
tructure of GPU’s makes them more efficient for image processing and for
processing large blocks of data. Due to the high memory bandwidth between CPU and GPU, 
the integration of GPU and CPU through standard protocols/API and running multi kernels 
scripts on GPU, GPU appears today to be a very interesting technology for an efficient and 
cost-effective implementation of CNN. Since 2003, GPU technology has been experiencing a 
fast growth. Further, in terms of design flexibility we can now implement very complex 
models and systems by using flexible and robust high-level tools for appropriate software 


 
14 
development. In OpenCL there is a standard API for communicating between CPU and GPU. 
This API does provide some essential commands for allocating memory, transferring 
memory content between host/CPUs RAM and device/GPUs RAM. There are some 
commands also for compiling GPU code which is a kernel file, and executing them. In 
OpenCL running multiple kernels is possible, and there is a direct and ultrafast channel for 
transferring memory and data between different kernels. These features make GPUs very 
flexible and reliable for designing complex architecture and models. 

Download 3,22 Mb.
1   ...   7   8   9   10   11   12   13   14   ...   81




Download 3,22 Mb.
Pdf ko'rish

Bosh sahifa
Aloqalar

    Bosh sahifa



Research question 6: How far can an efficient implementation of CNN on either FPGA

Download 3,22 Mb.
Pdf ko'rish