Figure  7-4: Convolution and stream processing diagram




Download 3,22 Mb.
Pdf ko'rish
bet48/81
Sana16.05.2024
Hajmi3,22 Mb.
#238917
1   ...   44   45   46   47   48   49   50   51   ...   81
Bog'liq
Alireza Fasih

Figure 
7-4: Convolution and stream processing diagram 
These two processes will run concurrently. The first process read the data and split it in 3 
separated rows, and the second process applies a convolution on these data. In each clock 
cycle, the columns module reads a pixel value from the live video stream source and it 
writes these pixels values on the output stream channels. These pixels values will then be 
used by the convolution process. 
7.7
 
Modeling cellular neural network by DDA 
Cellular Neural Network (CNN) is a paradigm for parallel processing that is similar to the 
neural network but with the difference that the connectivity between the cells is rather 
local and not global like in neural network. The main parts of a CNN module are the 
convolution module and the integrator [50]. For designing an integrator module we need a 
memory for keeping both data and cell values. According to the Equation 7-1, describing 
the CNN cell dynamics, we need three templates for each cell: a feedback template TA, a 
control template TB, and a bias template I.
(7-1)
ij
ij
A
ij
B
ij
ij
I
y
T
u
T
x
x
-
where, ‘
A
T
’ is denoted feedback template (3x3), and ‘
B
T
’ is denoted control template
(3x3), ‘u’, ’x’, ‘y’ and ‘I’ are the input, the state, the output and the bias term, respectively.
The output signal is related to the Equation 7-2 below equation. 
Stream IN 
Columns 
Convolution 
Row 1 
Row 2 
Row 3 
Stream OUT 


 
76 
(7-2) 
))
(
(
)
(
t
x
f
t
y
ij
ij
In the Equation 7-2, the function 
)
(
x
f
is a nonlinear activation function defined in 
Equation 7-3. 
(7-3) 
1
1
2
1
)
(
x
x
x
f
We can model the solution of the Equation 7-1 on FPGA by implementing a simple 
approximation technique like the well-known digital differential analyzer (DDA). A DDA, 
also sometimes called “digital integrating computer”, is a digital implementation of the
differential analyzer. The integrators in DDA are implemented as accumulators, whereby 
the numeric results are converted back to a pulse rate by the overflow of the accumulator. 
The main advantage of the digital integrator, when compared to an analog integrator, is the 
scalable pre
cision. Also, in a digital integrator based on DDA, we don’t have drift errors and
noise due to the imperfection of electronic components. By accumulating over time of the 
values in a register, we can calculate the integral of input signals. The basic digital 
integrator is expressed by Equation 7-4. 
(7-4) 
S
K
X
X
n
n
1
In Equation 7-4, 
1
n
X
denotes the next state of the accumulator used for calculating the 
integral. The coefficient of 
K
is a constant factor that is less than 1; it is used for time-
scaling. In this equation, 
S
denotes the input signal for integration. We can map this 
technique on FPGA very easily by writing a behavioral code. After each rising clock pulse, 


 
77 
the equation updates the integral value. In this integrator, rounding or truncation errors 
are only due to the limitation of registers. Therefore, by increasing the register sizes we 
have a way to control/reduce this error. This error is cumulative. Thus, for low precision 
registers a lack of accuracy will be observed after a long time. The only way for overcoming 
to this problem is setting proper register sizes. 
By this way, we can compute directly the solution of differential equations. This 
simulation of analog computing is a fully parallel method for solving differential systems 
such as nonlinear equations and also to realize the integrator module within CNN. For 
integrator modules we must have access to the memory for storing the values and the cells 
output. The only critical term in CNN equation is the “Integrator”, which we
implement/realize through the DDA model. After approximating the basic CNN cell, we 
must cascade the cells together. All these steps are implemented in the CoDeveloper by 
using a Fixed-Point method. The result for each three rows will be stored in the memory. 
The CoDeveloper can handle the access to the external memory through the multi port 
memory controller (MPMC). This controller is a full feature memory controller that is 
compatible with standard DDR2 memory devices. This controller must be configured for at 
least one read and one write port. And for the many high-end video processing 
applications, there is no implicit limit on the number of read or writes ports in MPMC. 

Download 3,22 Mb.
1   ...   44   45   46   47   48   49   50   51   ...   81




Download 3,22 Mb.
Pdf ko'rish

Bosh sahifa
Aloqalar

    Bosh sahifa



Figure  7-4: Convolution and stream processing diagram

Download 3,22 Mb.
Pdf ko'rish