7
Network Architecture. The tasks tackled using this transfer learning approach include
age estimation and gender classification. Following is the network architecture we used
in our models to train on top of features extracted.
For
the gender classification,
for convenience,
we chose custom model names
VGG_f_gender, ResNet50_f_gender and SENet50_f_gender whose design as follows.
VGG_f_gender comprises of 2 blocks, each containing layers in order of batch normal-
ization, spatial dropout with drop probability of 0.5, separable convolutions layers with
512 filters of size 3x3 with keeping padding same to reduce loss of information during
convolution operations followed by max pooling with kernel size 2x2. The fully con-
nected system consisted of batch norm layers, followed by alpha dropout, and 128 neu-
rons with ReLU activation and He uniform initialized followed by another batch norm
layer and finally the output layer with 1 neuron with sigmoid activation.
Batch size
chosen was 64. ResNet50_f_gender comprises of just the fully connected system with
batch norm, dropout with probability of 0.5, and followed by 128 units with exponential
linear units (ELU)
activation, with He uniform initialization
and having max norm
weight constraint of magnitude 3. The output layer had
single neuron with sigmoid
activation. The batch size we chose for this was 128. For, SENet50_f_gender we kept
the same model as for ResNet50_f_gender.
For the age estimation the models have been named VGG_f_age, ResNet50_f_age
and SENet50_f_age. VGG_f_age consists of 2 convolution blocks each containing in
order, a batch norm layer, spatial dropout with keep probability of 0.8 and 0.6 respec-
tively, separable convolution layer with 512 filters of size 3x3, padding same so that
dimension doesn’t change (and information loss is curtailed),
with ReLU activation
function and He initialization. Each convolution block was followed by max pooling
with kernel size 2x2. The fully connected system consisted of 3 layers with 1024, 512,
128 neurons respectively, with a dropout keep probability of 0.2, 0.2, and 1. Each layer
had ELU activation function with He uniform initialization. The output layer had one
unit, ReLU activation function with He uniform initialization and batch normalization.
Batch size of 128 was chosen. ResNet50_f_age consists of a fully-connected system of
5 layers with 512, 512, 512, 256, 128 units with dropout with keep probability of 0.5,
0.3, 0.3, 0.3 and 0.5 respectively. Each of the layers contains batch normalization and
has Scaled Exponential Linear Unit (SELU) as the activation function. Like previously,
for SENet50_f_age we kept the same model as for ResNet50_f_age.