81
convolution calculations for every 3 rows, we can send it to the video output. In CNN, we
have to keep those pixels values in a buffer for the next n iterations. The number of n is
depending on the accuracy and on the DDA approximation. For this work we have used a
32bit MicroBlaze CPU for a combination of both hardware and software; hardware
convolution modules must connect to the memory and data path according to the Figure 7-
1 architecture. All the parameters and data can be transmitted between modules and
memory, which is based on FIFO stream buffers and
Processor Local Bus
(PLB) technique;
see Refs [127, 128].
7.10
Direct convolution vs. CNN
–
performance comparison
We have found by experience that in our design image smoothing operators, whether
linear or non-linear, such as uniform filter, median filter, Gaussian filter and so on, which is
easy to model by a direct convolution, are faster than a CNN based processing for the same.
The results of first order and second order derivatives for finding edging in direct
convolution was also faster than CNN implementation. The direct convolution method
needed only two clock pulses for each pixel, while CNN needed 10 clocks (it depends on the
number of DDA iterations).