11
dataset. Validation set is not used since we haven’t tuned these models, default hyper
parameters of Sci-kit learn and XGBoost libraries have been used for this.
Table 10. Untuned baseline for Gender Classification
Method
Train (Accuracy)
Test (Accuracy)
Decision Tree
99.86
59.26
Linear SVC
96.13
91.44
Logistic Regression
97.38
92.11
Gradient Boosted Trees
95.15
93.38
XGBoost
95.03
93.80
Linear Discriminant Analysis
95.30
94.39