Research question 6: How far can an efficient implementation of CNN on either FPGA

Download 3,22 Mb. Pdf ko'rish
bet	11/81
Sana	16.05.2024
Hajmi	3,22 Mb.
	#238917

1 ... 7 8 9 10 11 12 13 14 ... 81

Bog'liq
Alireza Fasih

Research question 6: How far can an efficient implementation of CNN on either FPGA
or GPU be designed and implemented?
CNN is a complex design in terms of implementation and performance on traditional
architectures of the von Neumann type (such as CPU and sequential processors).
Therefore, we are looking for an appropriate architecture compatible with the parallel
nature of CNN. Traditional ANNs display some form of global connectivity; this results in a
difficult implementation in digital hardware. In contrast CNN has local connectivity of its
cells; this results in a great potential for implementation of this architecture on either
FPGA or GPU. FPGA macro cells and logical components can operate in a highly parallel
manner. Further, due to a very flexible routing between them we can implement any
complex digital circuit scheme or model on FPGA. To get more advantages from using
FPGA one should use high level behavioral modeling languages such as VHDL, Verilog or
SystemC. FPGA has a local and embedded memory; this is very important for storing the
CNN states. Otherwise the memory access processes between the FPGA and and an
external memory could be very time consuming and constitute a big bottleneck. Today
most of FPGAs chips on the market do contain an internal dedicated standard CPU that has
access to the hardware and to the logical field of the FPGA through the standard bus
controller [34, 35]. There are many high level compilers based on ANSI-C standard for
coding and debugging. This technology increases the system performance through
integration of hardware and software. Therefore, the loading of initial states of the CNN
cells, of the templates values, and setting ‘time scales’ and other parameters can be
done/performed easily by the embedded CPU. The resources of FPGA are not endless and
we have to consider this issue during designing the CNN architecture.
Another interesting platform for CNN is GPU, which is getting more popular every day. The
highly parallel s
tructure of GPU’s makes them more efficient for image processing and for
processing large blocks of data. Due to the high memory bandwidth between CPU and GPU,
the integration of GPU and CPU through standard protocols/API and running multi kernels
scripts on GPU, GPU appears today to be a very interesting technology for an efficient and
cost-effective implementation of CNN. Since 2003, GPU technology has been experiencing a
fast growth. Further, in terms of design flexibility we can now implement very complex
models and systems by using flexible and robust high-level tools for appropriate software

14
development. In OpenCL there is a standard API for communicating between CPU and GPU.
This API does provide some essential commands for allocating memory, transferring
memory content between host/CPUs RAM and device/GPUs RAM. There are some
commands also for compiling GPU code which is a kernel file, and executing them. In
OpenCL running multiple kernels is possible, and there is a direct and ultrafast channel for
transferring memory and data between different kernels. These features make GPUs very
flexible and reliable for designing complex architecture and models.

Download 3,22 Mb.

1 ... 7 8 9 10 11 12 13 14 ... 81

Download 3,22 Mb.

Pdf ko'rish