EECS 391/491: PROBLEM SET #10 SOLUTIONS
Prof. Michael S. Branicky

Problem 10.1

The CPTs can be computed from the data by simply counting:
   P(L) = (54+7+3+4)/100 = 0.6800
   
   P(B) = (54+1+7+27+3+2)/100 = 0.9400
   
   P(G|B)  = (54+1+7+27)/94 = 0.9468
   P(G|~B) = 0/6            = 0.0000
   
   P(M|B,L)   = (54+3)/(54+7+3) = 0.8906
   P(M|B,~L)  = (1)/(1+27+2)    = 0.0333
   P(M|~B,L)  = 0/4             = 0.0000
   P(M|~B,~L) = 0/2             = 0.0000

Problem 10.2

  1. The naive Bayes classifier looks as follows:
               PlayTennis_
              /  |   \    \_
             /   |    \     \_
            /    |     \      \
        Outlook Temp Humidity  Wind
    
  2. Again, the CPTs can be computed from the data by simply counting:
       P(Yes) = 9/14
       P(No)  = 5/14
       
       P(Sunny | Yes)    = 2/9
       P(Rain | Yes)     = 3/9
       P(Overcast | Yes) = 4/9
       P(Sunny | No)     = 3/5
       P(Rain | No)      = 2/5
       P(Overcast | No)  = 0
       
       P(Hot | Yes)  = 2/9
       P(Cool | Yes) = 3/9
       P(Mild | Yes) = 4/9
       P(Hot | No)   = 2/5
       P(Cool | No)  = 1/5
       P(Mild | No)  = 2/5   [ = 1 - (2/5) - (1/5)]
       
       P(High | Yes)   = 3/9
       P(Normal | Yes) = 6/9  [ = (9-3)/9 ]
       P(High | No)    = 4/5
       P(Normal | No)  = 1/5
       
       P(Strong | Yes) = 3/9
       P(Weak | Yes)   = 6/9
       P(Strong | No)  = 3/5
       P(Weak | No)    = 2/5
    
    Because results in each section above conditioned on the same events add to one, it is also conventional to drop one of each of these.

  3. Below, k is a constant of proportionality:
    P(Yes|Sunny,Cool,High,Strong) = k P(Yes) P(Sunny|Yes) P(Cool|Yes) P(High|Yes) P(Strong|Yes)
                                  = k (9/14) (2/9) (3/9) (3/9) (3/9)
                                  = k/(27*7) = k 0.00529
    
    P(No|Sunny,Cool,High,Strong) = k P(No) P(Sunny|No) P(Cool|No) P(High|No) P(Strong|No)
                                 = k (5/14) (3/5) (1/5) (4/5) (3/5)
                                 = k 18/(7*125) = k 0.02057
    
    Since the latter is larger, the naive Bayes classifier predicts PlayTennis = No.

  4. This is an easy computation given the answer to (c) above:
    P(No|Sunny,Cool,High,Strong) = 0.02057/(0.00529+0.02057) = 0.7954
    

Problem 10.3

                                  1 
    x1 x2  | x1 XOR x2        x1------(1.5)
   -------------------          \    /    \
     0  0  |     0              1\  /      \-1
     0  1  |     1                \/        \
     1  0  |     1                /\         (0.5)--->
     1  1  |     0              1/  \       /
                                /    \     /1
                              x2------(0.5)
                                  1
Many other answers are possible.

Problem 10.4

The positive examples are shown as filled circles, the negative as open cirlces. The separating hypersurface is shown as a bold, dotted line in both spaces, with the negative side shaded. The margin in the support vector space is 2.

Problem 10.5

  1. The augmented data is
       x0 x1 x2 x3   desired_output
      -------------  --------------
       -1  1  0  0          1
       -1  0  1  1          0
       -1  1  1  0          1
       -1  1  1  1          0
       -1  0  0  1          0
       -1  1  0  1          1
    
    The weight update equation is
       W := W + (desired_output-net_output) X ;  net_output = ( W.X > 0)?
    
    The algorithm proceeds as follows:
       W = [  0  0  0  0 ]    // Initially
       
       epoch=1:
       --------
       W = [  0  0  0  0 ] + (1-0) [ -1  1  0  0 ] = [ -1  1  0  0 ] 
       W = [ -1  1  0  0 ] + (0-1) [ -1  0  1  1 ] = [  0  1 -1 -1 ] 
       W = [  0  1 -1 -1 ] + (1-0) [ -1  1  1  0 ] = [ -1  2  0 -1 ]
       W = [ -1  2  0 -1 ] + (0-1) [ -1  1  1  1 ] = [  0  1 -1 -2 ]
       W = [  0  1 -1 -2 ] + (0-0) [ -1  0  0  1 ] = [  0  1 -1 -2 ]
       W = [  0  1 -1 -2 ] + (1-0) [ -1  1  0  1 ] = [ -1  2 -1 -1 ]
       
       epoch = 2:
       ----------
       W = [ -1  2 -1 -1 ] + (1-1) [ -1  1  0  0 ] = [ -1  2 -1 -1 ]
       W = [ -1  2 -1 -1 ] + (0-0) [ -1  0  1  1 ] = [ -1  2 -1 -1 ]
       W = [ -1  2 -1 -1 ] + (1-1) [ -1  1  1  0 ] = [ -1  2 -1 -1 ]
       W = [ -1  2 -1 -1 ] + (0-1) [ -1  1  1  1 ] = [  0  1 -2 -2 ]
       W = [  0  1 -2 -2 ] + (0-0) [ -1  0  0  1 ] = [  0  1 -2 -2 ]
       W = [  0  1 -2 -2 ] + (1-0) [ -1  1  0  1 ] = [ -1  2 -2 -1 ]
       
       epoch = 3:
       ----------
       W = [ -1  2 -2 -1 ] + (1-1) [ -1  1  0  0 ] = [ -1  2 -2 -1 ]
       W = [ -1  2 -2 -1 ] + (0-0) [ -1  0  1  1 ] = [ -1  2 -2 -1 ]
       W = [ -1  2 -2 -1 ] + (1-1) [ -1  1  1  0 ] = [ -1  2 -2 -1 ]
       W = [ -1  2 -2 -1 ] + (0-0) [ -1  1  1  1 ] = [ -1  2 -2 -1 ]
       W = [ -1  2 -2 -1 ] + (0-0) [ -1  0  0  1 ] = [ -1  2 -2 -1 ]
       W = [ -1  2 -2 -1 ] + (1-1) [ -1  1  0  1 ] = [ -1  2 -2 -1 ]
       
       There was no change in the weight vector over the entire third epoch.
       Thus, the procedure has converged and correclty learned the function.
    
  2. The positive examples are shown as filled circles, the negative as open cirlces. The separating hyperplane's intersection with the boolean cube is shown shaded. I found it by noting that the plane's equation is
         2 x1 - 2 x2 - x3 = -1
    
    and noting that the following points (and all in between) are solutions:
        (0, 0, 1),  (1, 1, 1),  (0, 1/2, 0),  (1/2, 1, 0)
    

Problem 10.6

Student solutions should contain either code or a discussion and/or listing of the equations used. I used the following Matlab code:
h3data=[0 1 1 1 1 0 1 1 1 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 0];
h4data=[1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0];

d=h3data; % process h3data

% a priori statistics
pcherry=[1 .75 .5 .25 0];
ph=[.1 .2 .4 .2 .1];
plime=sum((1-pcherry).*ph);  % .* is element-by-element multiplication

N=length(d);
all=[ph plime];
for i=1:N,
  Np=sum(d(1:i));
  pdgivenh=(pcherry.^Np).*((1-pcherry).^(i-Np));  % .^ is elt-by-elt raise-to-power
  phgivend=pdgivenh.*ph;
  phgivend=phgivend/sum(phgivend);  % normalize
  plimegivend=sum((1-pcherry).*phgivend);
  all=[all; phgivend plimegivend];
end;

% list all the results
[[-1 d]' all]

The output for h3data was

  Cherry=1,
   Lime=0    P(h1|d)   P(h2|d)   P(h3|d)   P(h4|d)   P(h5|d)   P(lime|d)
------------------------------------------------------------------------
   -1.0000    0.1000    0.2000    0.4000    0.2000    0.1000    0.5000
         0         0    0.1000    0.4000    0.3000    0.2000    0.6500
    1.0000         0    0.2143    0.5714    0.2143         0    0.5000
    1.0000         0    0.3214    0.5714    0.1071         0    0.4464
    1.0000         0    0.4355    0.5161    0.0484         0    0.4032
    1.0000         0    0.5473    0.4324    0.0203         0    0.3682
         0         0    0.3716    0.5872    0.0413         0    0.4174
    1.0000         0    0.4783    0.5039    0.0177         0    0.3848
    1.0000         0    0.5832    0.4096    0.0072         0    0.3560
    1.0000         0    0.6792    0.3180    0.0028         0    0.3309
         0         0    0.5131    0.4805    0.0063         0    0.3733
    1.0000         0    0.6141    0.3834    0.0025         0    0.3471
         0         0    0.4423    0.5522    0.0055         0    0.3908
         0         0    0.2829    0.7066    0.0105         0    0.4319
    1.0000         0    0.3735    0.6219    0.0046         0    0.4078
         0         0    0.2290    0.7625    0.0085         0    0.4449
         0         0    0.1287    0.8570    0.0143         0    0.4714
    1.0000         0    0.1826    0.8106    0.0068         0    0.4560
         0         0    0.1001    0.8888    0.0111         0    0.4778
         0         0    0.0524    0.9302    0.0175         0    0.4913
         0         0    0.0267    0.9467    0.0267         0    0.5000
         0         0    0.0133    0.9467    0.0400         0    0.5067
    1.0000         0    0.0203    0.9595    0.0203         0    0.5000
    1.0000         0    0.0304    0.9595    0.0101         0    0.4949
    1.0000         0    0.0451    0.9499    0.0050         0    0.4900
         0         0    0.0230    0.9693    0.0077         0    0.4962

The output for h4data was

   -1.0000    0.1000    0.2000    0.4000    0.2000    0.1000    0.5000
    1.0000    0.2000    0.3000    0.4000    0.1000         0    0.3500
         0         0    0.2143    0.5714    0.2143         0    0.5000
    1.0000         0    0.3214    0.5714    0.1071         0    0.4464
         0         0    0.1800    0.6400    0.1800         0    0.5000
         0         0    0.0900    0.6400    0.2700         0    0.5450
         0         0    0.0413    0.5872    0.3716         0    0.5826
         0         0    0.0177    0.5039    0.4783         0    0.6152
         0         0    0.0072    0.4096    0.5832         0    0.6440
         0         0    0.0028    0.3180    0.6792         0    0.6691
         0         0    0.0010    0.2376    0.7613         0    0.6901
    1.0000         0    0.0025    0.3834    0.6141         0    0.6529
         0         0    0.0010    0.2936    0.7054         0    0.6761
         0         0    0.0004    0.2171    0.7825         0    0.6955
         0         0    0.0001    0.1561    0.8438         0    0.7109
         0         0    0.0000    0.1098    0.8902         0    0.7225
         0         0    0.0000    0.0760    0.9240         0    0.7310
         0         0    0.0000    0.0520    0.9480         0    0.7370
         0         0    0.0000    0.0353    0.9647         0    0.7412
         0         0    0.0000    0.0238    0.9762         0    0.7441
    1.0000         0    0.0000    0.0465    0.9535         0    0.7384
         0         0    0.0000    0.0315    0.9685         0    0.7421
    1.0000         0    0.0000    0.0610    0.9390         0    0.7348
         0         0    0.0000    0.0415    0.9585         0    0.7396
    1.0000         0    0.0000    0.0797    0.9203         0    0.7301
         0         0    0.0000    0.0546    0.9454         0    0.7364
In both cases, the true hypothesis eventually dominates and the prediction of the probability that the next candy is lime or not closes in on that which would be made by the MAP hypothesis (0.5 and 0.75, respectively).

Problem 10.7

I wrote a few lines of Matlab code to compute this (code suppressed). My output was as follows:
W = 0     0     0     0

epoch = 1

W = -1     1     0     0
W =  0     1    -1    -1
W = -1     2     0    -1
W =  0     1    -1    -2
W =  0     1    -1    -2
W = -1     2    -1    -1

epoch =2

W = -1     2    -1    -1
W = -1     2    -1    -1
W = -1     2    -1    -1
W =  0     1    -2    -2
W =  0     1    -2    -2
W = -1     2    -2    -1

epoch = 3

W = -1     2    -2    -1
W = -1     2    -2    -1
W = -1     2    -2    -1
W = -1     2    -2    -1
W = -1     2    -2    -1
W = -1     2    -2    -1
This matches with that computed by hand in Problem 10.5(a) above.
Created: 2007-04-10. Modified: 2007-04-30.