11/17 - Defaults and Diagnosis

Today:
Kinds of default reasoning
Methods of default reasoning
	- Negation as failure
	- Minimal entailment
	- Default rules

1. Kinds of default reasoning

Logic is monotonic ==> nothing you learn can ever cause you to withdraw a conclusion!

Little knowledge outside of pure mathematics has that property.

Instead: we reach the best conclusions we can on the basis of our current knowledge:
but learning something new may lead us to new conclusions.

Defaults play many different roles in commonsense reasoning:

1. Generic or typical information:
	
	Birds fly.
	Cars run on highways.
	People have two legs.

There are exceptions to essentially all such assertions.  You can try to account for
the exceptions:

	A x . bird(x) & ~penguin(x) => flies(x)

Problem: can always think of another exception:

	A x . bird(x) & ~penguin(x) & ~clippedWings(x) => flies(x)

What about trying:

	A x . bird(x) & ~abnormal(x) => flies(x)

This works, but only by reducing the problem to inferring ~abnormal(x).  

What about using probability theory instead of logic? Suppose you assert:

	Prob( fly(x) | bird(x) ) = .9
	Prob( fly(x) | bird(x) & penguin(x) ) = .1

Suppose your knowledge base is 

	KB = { bird(tweety) & penguin(tweety) }.

Then by the rules of probability you can conclude:
	Prob( fly(tweety) | KB ) = .1

But there is a fly in the ointment (bad pun).  Suppose your KB is

	KB = { bird(tweety) & penguin(tweety) & yellow(tweety) }

It is no longer the case that probability theory alone lets you conclude:

	Prob( fly(tweety) | KB ) = .1

To get such an inference to go through, you actually would have to add the following
assertion (or other assumptions that entail this assertion):

P(fly(x) | bird(x) & penguin(tweety)) =	P(fly(x) | bird(x) & penguin(x) & yellow(x))

This is an example of an "independence assumption", which says that whether or not a
penguin flies is independent of its color.  In the Mathematical AI course you'll
learn about what are called "graphical models", which can be viewed as a language for
compactly asserting independence assumptions.

In any case:  the point is that using probability theory does not eliminate the need
for default reasoning, because it also requires certain kinds of defaults.

*******************

2.  Another kind of default is a "closed world" or "domain closure" assertion.  For
example, suppose you are told that the list of students enrolled in CS 244 is 
	{ Garret, Bill, Jacob }
This might be represented logically as
	enrolled(garret,cs244)
	enrolled(bill,cs244)
	enrolled(jacob,cs244)
Suppose you wanted to know if all the students in 244 are male:
	? A x . enrolled(x,cs44) => male(x)
Even if you knew	
	male(garret)
	male(bill)
	male(jacob)
the query would come out false.  You would need to add a domain closure assertion:
	? A x . enrolled(x,cs44) => (x=garret v x=bill v x=jacob)
The problem with adding the domain closure assertion manually is that you would have
to find and modify it whenever someone new enrolls in the course:
	enrolled(diane,cs244)
It would be more convenient to simply be able to assert a default rule to the effect
that "enrolled" is completely specified at any point in time.

A closed world assumption generalizes this idea to the case where you want to say
that "any instances of a predicate or formula not asserted to be true are false".
For example, in the blocks world planning domain, we might specify an initial state
as:
	on(A,B)
	on(B,C)
and not want to have to explicitly assert ~on(A,C).  Having to explicitly assert all
the negative facts in a domain is an enormous burden, because there are so many more
NEGATIVE facts then there are POSITIVE facts!

For example: suppose you have N blocks, and each block can be on exactly one other
block or on the table.  What is the maximum number of positive facts?  Of negative
facts?

A special case of closed world reasoning that comes up when we are reasoning about a
world that can change over time is "inertia".  That is, we often assume when
reasoning about a problem domain that "things don't change unless we are told or can
infer that they change".  Of course, sometimes we are wrong:

	1. Mary picked up a loaded gun.
	2. She pointed it straight at John and fired.

	Default conclusion: John is shot.
	
	1. Mary picked up a loaded gun.
	1a.  She carefully unloaded it.
	2. She pointed it straight at John and fired.

	Default conclusion: John is not shot.
	
	1. Mary picked up a loaded gun.
	1a.  She carefully unloaded it.
	1b.  She then loaded it again.
	2. She pointed it straight at John and fired.

There are actually TWO kinds of defaults being used here:  one is inertia, the other
is about the proper way to tell a story -- you shouldn't leave out crucial
information!  This brings us to the 3rd kind of default:

*******************

3.  Conventions of Communication

One of these was just used -- "include all the relevent information!"

Here is another: consider the following dialog:
	Student to teacher:  Who is getting an A in CS 244?
	Teacher: Mark.  He voted for John McCain.

What do you conclude about the REASON that Mark got an A?  What is the default
CONVENTION of speech that leads you to this conclusion?

There is a whole part of linguistics called "Speech Act theory" that deals with the
kinds of communication defaults used in natural language.

*******************************************************

So now you are convinced we need defaults.  We'll now talk about several different
ways of formally representing defaults:

	- Negation as failure
	- Minimal entailment
	- Default rules

*******************************************************
1. Negation as failure: this is a kind of closed world reasoning that is built into
Prolog.  In Prolog, there is no explicit negation operator.

There is a operator "\+" which returns TRUE if the following predicate CANNOT be
proved.  So, for example, you can write:

fly(x) :- bird(x), \+penguin(x).
bird(tweety).
bird(opus).
penguin(opus).

? fly(tweety)  YES
? fly(opus)    NO

Note that the "NO" does not really mean ~fly(opus), only that you cannot PROVE opus
flies.  You can add:

fly(x) :- hasAirplane(x).
hasAirplane(opus).

? fly(opus)    YES

There are two problems with negation as failure.  The first is that it is easy to
write programs using it that never terminate:

republican(x) :- voter(x), \+democrate(x).
democrate(x) :- voter(x), \+republican(x).

This kind of circular reasoning can be outlawed by restricting the use of the \+
operator to STRATIFIED logic programs.  In a stratified program, the defined
predicates can be put into a total order
	p1, p2, ..., pk
such that there is a never the case where for i<j, the body of a clause headed by pi
contains a negated instance of pj.  So the republican/democrate example is NOT
stratisfied.

But the second problem is that negation as failure does not play nicely with full
FOL, once you get outside of definite clauses.  Suppose you added to the clauses
above:

bird(bert)
bird(sam)
penguin(bert) v penguin(sam)

Then you ask:
? penguin(bert)  NO, NOT PROVABLE
? penguin(sam)  NO, NOT PROVABLE
? fly(bert)   YES
? fly(sam)   YES

But now you are in trouble: you know that either bert or sam is a penguin, but have
concluded that both can fly!

**************************************************

Reasoning with MINIMAL MODELS, also called CIRCUMSCRIPTIVE REASONING, does work with
full FOL.  The idea is that you declare certain predicates to be minimized (or
circumscribed).  
 
Let's write the bird example as follows: KB=

bird(x) & ~abnormal(x) => flies(x)
penguin(x) => abnormal(x)

bird(tweety)
bird(bert)
bird(sam)
penguin(bert) v penguin(sam)

Note that we CANNOT prove that fly(tweety) by ordinary entailment, that is, it is not
the case that fly(tweety) holds in ALL models of KB.

What we're going to do is to define a SUBSET of the models of the KB.  Given two
models M1 and M2, define
	M1 <=[abnormal] M2
iff the EXTENSION of abnormal in M1 does not contain any individual NOT in the
extension of abnormal in M2.

M is a MINIMAL MODEL of F with respect to abnormal if M is a model of F, and there is
not a model M' of F such that M' < M2 (strictly smaller extension)

Then we say F minimally entails G (with respect to abnormal)
	F |=[abnormal] G  
if G holds in all model of F that are minimal with respect to abnormal.

In our example, I claim the following:
	KB |=[abnormal] fly(tweety)
	NOT the case that: KB |=[abnormal] fly(bert)
	NOT the case that: KB |=[abnormal] fly(sam)
	KB |=[abnormal] fly(sam) v fly(bert)

Let's do the NIXON DIAMOND
	Republican, Quaker, Pacifist

*******************************************************************
Default logic: this is an alternative approach to solving the problem with
generalizing negation as failure to full FOL.  The idea is generalize our ordinary
logical proof rules:

ASSUMPTIONS
-----------
CONCLUSION

to ones of the form:	
	
ASSUMPTIONS : CONSISTENCY_CONDITION
-----------------------------------
CONCLUSION

The idea is that the rule can be applied ONLY if ASSUMPTIONS are proved, and the
CONSISTENCY_CONDITION is consistent with the new KB that is reached by applying ALL
the rules that can possibly be applied.  

Define EXTENSION.