Ultra fast cnn based Hardware Computing Platform Concepts for adas visual Sensors and Evolutionary Mobile Robots




Download 3,22 Mb.
Pdf ko'rish
bet54/81
Sana16.05.2024
Hajmi3,22 Mb.
#238917
1   ...   50   51   52   53   54   55   56   57   ...   81
Bog'liq
Alireza Fasih

8.1
 
CNN Based High Performance Computing for Real Time Image 
Processing on GPU 
 
Many of the basic image processing tasks suffer from processing overhead to operate over 
the whole image. In real time applications the processing time is considered as a big 
obstacle for its implementations. A High Performance Computing (HPC) platform is 
necessary in order to solve this problem. The usage of hardware accelerator make the 
processing time low. In recent developments, the Graphics Processing Unit (GPU) is being 
used in many applications. Along with the hardware accelerator a proper choice of the 
computing algorithm makes it an added advantage for fast processing of images. The 
Cellular Neural Network (CNN) is a large-scale nonlinear analog circuit able to process 
signals in real time [12]. In this research, we develop a new design in evaluation of image 
processing algorithms on the massively parallel GPUs with CNN implementation using 
Open Computing Language (OpenCL) programming model. This implementation uses the 
Discrete Time CNN (DT-CNN) model which is derived from originally proposed CNN model. 
The inherent massive parallelism of CNN along with GPUs makes it an advantage for high 
performance computing platform [131]. The advantage of OpenCL makes the design to be 
portable on all the available graphics processing devices and multi core processors. 
Performance evaluation is done in terms of execution time with both device (i.e. GPU) and 
host (i.e. CPU).
 
8.2
 
Introduction 
Image processing is an ever expanding and dynamic area with applications reaching out 
into everyday life such as in medicine, space exploration, surveillance, authentication, 
automated industry inspection and in many more areas [132]. Real time image processing 
using modern processors is limited [52]. Problems in computer vision are computationally 
intensive [133]. The tremendous amount of data required for image processing and 
computer vision applications present a significant problem for conventional 
microprocessors [52]. Consider a sequence of images at medium resolution (512
×
512 
pixels) and standard frame rate (30 frames per second) in color (3 bytes per pixel). This 
represents a rate of almost 24 million bytes of data per second. A simple feature extraction 


 
85 
algorithm may require thousands of basic operations per pixels, and a typical vision system 
requires significantly more complex computations.
As we can see, parallel computing is essential to solve such problems [133]. In fact, the 
need to speed up image processing computations brought parallel processing into 
computer vision domain. Most image processing algorithms are inherently parallel because 
they involve similar computations for all pixels in an image except in some special cases 
[133]. Conventional general-purpose machines cannot manage the distinctive I/O 
requirements of most image processing tasks; neither do they take advantage of the 
opportunity for parallel computation present in many vision related applications [121]. 
Many research efforts have shifted to 
Commercial-Off-The-Shelf
(COTS) -based platforms in 
recent years, such as 
Symmetric Multiprocessors
(SMP) or clusters of PCs. However, these 
approaches do not often deliver the highest level of performance due to many inherent 
disadvantages of the underlying sequential platforms and “the divergence problem”. The
recent advent of multi-million gate on the 
Field Programmable Gate Array
(FPGAs) having 
richer embedded feature sets, such as plenty on 

chip memory, DSP blocks and embedded 
hardware microprocessor IP cores, facilitates high performance, low power consumption 
and high density [134].
But, the development of dedicated processor is usually expensive and their limited 
availability restricts their widespread use and its complexity of design and implementation 
also makes the FPGA not preferable. However, in the last few years, the graphic cards with 
impressive performance are being introduced into the market for lower cost and flexibility 
of design makes it a better choice. Even though they have been initially released for the 
purpose of gaming, they also find the scientific applications where there is a great 
requirement of parallel processing. Along with the support of hardware platforms there 
are some software platforms available like 
Compute Unified Device Architecture
(CUDA) 
and OpenCL for designing and developing parallel programs on GPU [135]. Out of these 
available software platforms OpenCL framework recently developed for writing programs 
can be executed across multicore heterogeneous platforms. For instance, it can be executed 
on multicore CPU’s and GPU’s and their combination. Usage o
f this framework also 
provides an advantage of the portability that is; the developed kernel is compatible with 
other devices. Along with the available hardware and software platforms we used the CNN 
parallel computing paradigm for some image processing applications.


 
86 
The idea of CNN was taken from the architecture of artificial neural networks and cellular 
automata. In contrast to ordinary neural networks, CNN has the property of local 
connectivity. The weights of the cells are established by the parameters called the 
template. The functionality of the CNN is dependent on the template. So with a single 
common computing model, by calculating the templates we can achieve the desired 
functionality. The CNN has been successfully used for various high-speed parallel signal 
processing applications such as image processing, visual computing and pattern 
recognition as well as computer vision [91]. So we thought of implementing it on the 
hardware for the need of HPC in real time image processing. Also, the parallel processing 
capability of the CNN makes us to implement the CNN architecture on the hardware 
platform for its efficient visualization.
In this research, the effort is done to develop a DT-CNN model on the graphics processing 
units with the OpenCL framework. An effort is done to make the development of DTCNN 
entirely on the kernel which make it executable on every platform. But, it should be noticed 
that the GPU is a coprocessor which supports the processor in our system. Hence, the CPU 
still executes several tasks, like the transmission of the data to the local memory of the 
graphics card and retrieving back. Finally, GPU-based 
Universal Machine - CNN
(UM-CNN) 
was implemented using the OpenCL framework on NVIDIA GPU. A benchmark is provided 
with the usage of GPU based CNN model for the image processing in comparison with CPU. 
The chapter is structured as follows: Section II gives a clear description about the theory 
involved in parallel computing. Section III introduces the concepts of CNN, the system 
diagram and its functionality and systems designed methodology which is done using 
OpenCL. Section IV concludes the section and says about the work going to be done in the 
future. 

Download 3,22 Mb.
1   ...   50   51   52   53   54   55   56   57   ...   81




Download 3,22 Mb.
Pdf ko'rish

Bosh sahifa
Aloqalar

    Bosh sahifa



Ultra fast cnn based Hardware Computing Platform Concepts for adas visual Sensors and Evolutionary Mobile Robots

Download 3,22 Mb.
Pdf ko'rish