Machine Learning - Spring 2004
==============================
Lab experiments 9
-----------------
Program: bn.pl
Data: alarm.pl, loandata.pl
---------------------------
The belief network representation is explained in the data
file alarm.pl. Read it.
The BN reasoning algorithm is used through p(Var,Obs,Dist), where:
- Var is a variable (as defined in variables([...]));
- Obs is a list of observations (variable=value pairs);
- Dist is a distibution of P(Var|Obs);
- Var must not appear in the observations Obs.
I. Reasoning with the Alarm belief network
==========================================
?- ['c:/prolog/bn.pl']. % load program
?- ['c:/prolog/alarm.pl']. % load data set
1. Diagnostic reasoning: from effect to cause (the effect is given as evidence)
-------------------------------------------------------------------------------
What is the probability distribution of Burglary (b) given the evidence
that John calls (j=t)?
?- p(b,[j=t],P).
P = [0.0162837, 0.983716]
The probability distribution [0.0162837, 0.983716] corresponds to the
values of the variable b as defined in values(b,[t,f]). Thus, the
probabilty of "f" is 0.983716. Without this evidence we have
?- p(b,[],P).
P = [0.001, 0.999]
This is the prior distribution of b as it has no parents.
Let's add more evidence: Jonh calls and Mary calls too.
?- p(b,[j=t,m=t],P).
P = [0.284172, 0.715828]
The probabilty of burglary increases. If they both call, it becomes
more likely that there is a burglary. But still, the alarm is not on,
so the probabilty of b is not too high. Adding the alarm (a=t) further
increases this probabilty.
?- p(b,[j=t,m=t,a=t],P).
P = [0.373551, 0.626449]
Because there is another possible cause for the alarm (earthquake),
the probabilty of b can also increase by adding the evidence that
there is no earthquake (e=f).
?- p(b,[j=t,m=t,a=t,e=f],P).
P = [0.484786, 0.515214]
This last example however is not diagnostic reasoning, because
b (burglary) and e (earthquake) are not connected by a causal relation.
2. Predictive reasoning: from cause to effect
---------------------------------------------
What is the probability distribution of John calls (j), given that
there is a burglary (b=t)?
?- p(j,[b=t],P).
P = [0.849017, 0.150983]
So, it's very likely that Jonh calls in this situation (the first
value for j is t). Similarly, we can get the probability of Mary calls
with the same evidence. It's lower than John's (why?).
?- p(m,[b=t],P).
P = [0.658614, 0.341386]
II. Creating a belief network for the loandata (adding additional data to loandata.pl)
======================================================================================
1. The variables are the class and the attributes:
variables([class, emp, buy, sex, married]).
2. The structure of the graph represents the causal relationship between
the attributes and the class. The class value determines (is a cause for) the
attibute values (the effects). So, we have the class node as a parent
of all attributes. In Prolog this is:
parents(emp,[class]).
parents(buy,[class]).
parents(sex,[class]).
parents(married,[class]).
parents(class,[]).
3. Attribute values -> values for variables
values(emp,[yes,no]).
values(buy,[comp,car]).
values(sex,[m,f]).
values(married,[yes,no]).
values(class,[approve,reject]).
4. Conditional probabilities: use bayes.pl to compute them.
Conditional independance assumption: Attr_i is conditionally independent
of Attr_j given Class.
CPT for emp:
------------
?- cond_prob([emp=yes],approve,P).
P=[(emp = yes) / 1.000]
?- cond_prob([emp=no],approve,P).
P=[(emp = no) / 0.000]
?- cond_prob([emp=yes],reject,P).
P=[(emp = yes) / 0.250]
?- cond_prob([emp=no],reject,P).
P=[(emp = no) / 0.750]
=> pr(emp,[class=approve],[1.000,0.000]).
pr(emp,[class=reject],[0.250,0.750]).
CPT for buy:
------------
?- cond_prob([buy=comp],approve,P).
P=[(buy = comp) / 0.875] ([buy=car] is obviously 0.125)
?- cond_prob([buy=comp],reject,P).
P=[(buy = comp) / 0.500] ([buy=car] is obviously 0.500)
=> pr(buy,[class=approve],[0.875,0.125]).
pr(buy,[class=reject],[0.500,0.500]).
and so on ...
-------------
pr(sex,[class=approve],[0.500,0.500]).
pr(sex,[class=reject],[0.250,0.750]).
pr(married,[class=approve],[0.500,0.500]).
pr(married,[class=reject],[0.750,0.250]).
CPT for class:
--------------
?- class_prob(approve,P).
P=0.667
?- class_prob(reject,P).
P=0.333
=> pr(class,[],[0.667,0.333]).
III. Classification of new examples
===================================
?- ['c:/prolog/loandata.pl']. % the BN is included in the data file
The example is supplied as evidence:
?- p(class,[emp=yes,buy=car,sex=m,married=no],P).
P = [0.889037, 0.110963]
Because the probabilty of class = approve is higher (0.889037) we can classify this examples as
approve.
Basically the same results are obtained by naive bayes:
?- ['c:/prolog/bayes.pl']. % load the program only, the examples are already loaded with loandata.pl
?- probs([emp=yes,buy=car,sex=m,married=no],[C1/L1,C2/L2]), P1 is L1/(L1+L2),P2 is L2/(L1+L2).
C1 = approve
L1 = 0.0208333
C2 = reject
L2 = 0.00260417
P1 = 0.888889
P2 = 0.111111
We need a longer query here, because naive bayes originally computes likelihoods (L1 and L2).
To get the probabilties (P1 and P2) we aplly normalization.