34
attribute
X
j
is obtained by first
centering the data for class D
i
via
Z
i
= D
i
-1 µ
i
T
The variance is then given as
σˆ =
1
𝑛
𝑗
Z
ij
T
Z
ij
.
Training the naive
Bayes classifier is very fast, with
O(nd) compu-
tational complexity. For testing, given a test point
x, it
simply returns the
class with the maximum posterior probability obtained as a product of
the likelihood for each dimension and the class prior probability.
Example 1. We take the following dataset shown in Table 1 as an
example and build the model.
Table 1
Weather nominal dataset
35
Table 2 is a frequency/probabilities table developed by Naïve Bayes
model on “weather nominal” dataset.
Table 2
Frequency/probability table
Once
we build the model, we classify a new day:
Sunny
Hot
High
False
?
Likelihoods:
P("
Yes
") = 2/9 x 2/9 x 3/9 x 6/9 x 9/14 ≈
0.007
P("
No
") = 3/5 x 2/5 x 4/5 x 2/5 x 5/14 ≈ 0.027
(Normalized) probabilities:
P("
Yes
") = 0.007 / (0.007 + 0.027) ≈
20.5%
P("
No
") = 0.027 / (0.007 + 0.027) ≈
79.5% Play = "
No"
What about this day:
Overcast
Hot
High
False
Likelihoods:
P("
Yes
") = 4/9 x 2/9 x 3/9 x 6/9 x 9/14 ≈
0.014
P("
No") = 0/5 x 2/5 x 4/5 x 2/5 x 5/14 =
0
(Normalized) probabilities:
P("
Yes") = 0.014 / (0.014 + 0.0) =
100% Play = "
Yes"
P("
No") = 0.0 / (0.014 + 0.0) =
0%
Does this make sense?
‒one attribute “overrules” all the others…;
‒we can handle this with the Laplace estimate.
Laplace estimate:
‒Add 1 to each frequency count;
‒Again, compute the probabilities.
Now, we build the frequency/probability table by considering
Laplace estimate