9
It is apparent that separable convolution coupled with spatial dropout (in the convolu-
tional layers) helped the model in converging faster and generalize better. This is be-
cause, separable convolutions consist of first performing a depth wise spatial convolu-
tion (which acts on each input channel separately) followed by a pointwise convolution
which mixes the resulting output channels. Basically, separable convolutions factorize
a kernel into 2 smaller kernels,
leading to lesser computations,
thereby helping the
model to converge faster.
Then we experimented with other arguments associated with the namely the type of
weight initialization and weight constraints which determine the final weights of our
model and hence its performance. Table 6 summarizes the results of this experiment.
Table 6. Comparison of models based on the arguments of the best performing layer
Layers
Configuration
Age Estimation
(MAE)
Age Classification
(Accuracy)
Gender Classification
(Accuracy)
Separable Conv2D +
Spatial dropout + Xavier
uniform
initialization
6.08
78.279
91.269
Separable Conv2D +
Spatial dropout + He uni-
form initialization