26
traditional parallel processing models many processors have access to a shared memory.
Message Passing Interface
(MPI) and directive-based
interface are two important
approaches in shared memory techniques. Figure 3-1 does illustrate the essentials of this
model.
Figure
3-1:
Shared memory’s parallel processing model
The main advantage of this processing model/architecture is that one can calculate very
complex algorithms on the stream of data with a shared memory. A major drawback of this
model is however the latency and bottleneck in memory as well as the complexity of the
task scheduling. Nevertheless, by coupling a simple distributed
memory as the state
variable of a single element and a simple processing unit in form of a 2-dimensional grid, to
a nonlinear operation that is coupled also to the neighbors through local connections one
can overcome too many problems of classical parallel image processing.
Figure
3-2 has
shown this type of architecture.
Shared Memory
Task Scheduler
Processor array
27
Figure
3-2: General idea for distributed processing
Worth a mentioning is that CNN is providing a similar model for processing data and
images. By changing templates we can define a new model for different
operations/functions. Coupling more than one layer of CNN
can enable the designer to
model very complex image processing operators [65, 66]. Due to the related robustness
modeling image processing by PDE’s is getting more popular
[67, 68]. Some equations
comes from minimizing energy function and some others are designed using geometrical
arguments like mean curvature motion [69]. There are many
application based on PDE
such as inpainting for recovering corrupted regions in image [70], image segmentation
[71], noise reduction edge preservation [72]. All of these examples and similar techniques
are essential for video processing in ADAS. The procedure of solving PDEs in CNN is by
transforming a PDE to set of ODEs. After transforming a continuous spatial PDE to an array
of discrete interactive systems which are ODEs, we can map it on CNN cells.
Because CNN is
natural and flexible paradigm for modeling a simple locally interconnected dynamical
system which are grid base.
The CNN architecture is very close to PDE’s and even a direct mapping of PDE’s into a CNN
processor matrix is possible [73]. In the case of linear PDEs we can map each independent
variable with related partial derivative of that to a CNN layer. If we have more than one
Memory Block
Local Processor
Memory Links
Main Processor
Global Memory
28
independent variable we have to couple many CNN layer together to provide the solution.
By defining the right templates one can model the behavior
of PDEs through a CNN
processor system that will generate the solution. T. Roska
et al
have shown in [73] a way
of how to simulate a space invariant nonlinear PDE by CNN. They have described the
dynamics of three different systems (i.e. 2D heat equation, Burgers’
equation and Navier
-
Stokes equation) by sets of equations. Mapping a two-dimensional heat equation which is
modeled by the Laplace operator has been solved in [73]. Equation 3-1 is showing this heat
model.
(3-1)
𝜕𝑢
(
𝑥
,
𝑦
,
𝑡
)
𝜕𝑡
=
𝑐∇
2
u(x, y, t)
In this equation,
∇
2
is the Laplace operator and it is applied to the intensity which is
𝑢
(
𝑥
,
𝑦
,
𝑡
)
. After spatial discretization of this equation, the PDE is transformed into a system
of ODEs. If we discrete the equation in space by steps of
∆
𝑥
= ∆
𝑦
=
ℎ
, then we can map the
𝑢(𝑥
,
𝑦
,
𝑡)
on a CNN layer. Before that we need a numerical solution of the equation based
on Taylor-series. Equation 3-2 has shown this approximation.
(3-2)
𝜕
2
𝑢
𝜕𝑥
2
~
1
ℎ
2
[
𝑢
(
𝑥
+
ℎ
,
𝑦
) −
𝑢
(
𝑥
,
𝑦
) − (
𝑢
(
𝑥
,
𝑦
) −
𝑢
(
𝑥
−
ℎ
,
𝑦
))]
=
1
ℎ
2
[
𝑢
𝑖
+1,
𝑗
− 2
𝑢
𝑖
,
𝑗
+
𝑢
𝑖
−1,
𝑗
]
Using this approximation, it is easy to map this equation onto the CNN template.
29
(3-3)
𝐴
=
⎣
⎢
⎢
⎢
⎡
0
1
2
0
1
2
−4
2
1
2
0
1
2
0
⎦
⎥
⎥
⎥
⎤
,
𝐵
= 0,
𝐼 = 0
The time evolution of the CNN processors using this template
will therefore give the
solution of heat equation.
For image processing there is a discrete time version of CNN that is easy to implement on
digital platforms like CPU and FPGA. Out first trial/exercise has consisted of two simple
cells which are connected to each other for solving a 2
nd
order ODE. Each CNN cell has an
internal integrator which fits for solving 1
st
order derivative equations. For solving higher
order equation such as 2
nd
or 3
rd
and etc, we should couple them as a system of simple 1
st
order derivative equations. In chapter 6 we have shown how to solve a nonlinear Rössler
equation by this technique. Later on, it is possible to model CNN by direct coupling of cells
and integrators by local connections.