Overview

Objectives

Learning in Document Processing: some data

Learning in document processing: Some data

Document processing requires a large amount of knowledge

Hand-coding knowledge?

Typical machine learning applications

Comparing development times (Michie, 1989)

Overview

What is learning ?

History

Neural modeling (1955-1965) (1986-…)

Decision-Theoretic Techniques (1955-1965)

Symbolic Concept-oriented Techniques (1962-1980)

Knowledge Intensive Learning Systems and Multistrategy Learning (1980 - today)

The general model of a Learning System

Learning Systems

Example: A handwriting recognition learning problem

Basic Questions

What do Machines learn?

Subsymbolic and Symbolic Learning

Both learning and performance rely on the ability to represent knowledge

Representing experience

Representing experience

Representing experience

Representing experience

Representing the knowledge

Representing the knowledge

Representing the knowledge

Levels of Concept Descriptions

The task

The degree of supervision

The degree of supervision

How do Machine Learn?

How do Machine Learn?

Inferences

Inferences

Inferences

Diapositiva di PowerPoint

The Inductive Paradigm

Empirical Learning(inductively learning from many data)

Empirical Learning (inductively learning from many data)

Example

A small training set

How many hypotheses?

BIAS

How many examples do we need?

The deductive paradigm(explanation based learning)

A multicriteria classification of machine learning methods

Overview

Statistical learning methods

Statistical learning methods

Trainable classifiers

The basic model for a trainable pattern classifier

The basic model for a trainable pattern classifier

The basic model for a trainable pattern classifier

The basic model for a trainable pattern classifier

Discriminant analysis (Fisher 1936)

Diapositiva di PowerPoint

Fisher classification functions

Overview

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

Decision tree learning

From Decision Trees to Decision Rules

Overview

Learning Rules Directly

Concept Learning

A Concept Learning Task

Representing Hypotheses

A Formalization

A Formalization

What information is available?

The inductive learning hypothesis

Concept Learning as Search

An example

Semantically distinct hypotheses

Efficient search: how ?

General-to-specific ordering

Diapositiva di PowerPoint

Terminology

Taking advantage of the general-to-specific ordering

Example

Diapositiva di PowerPoint

FIND-S Algorithm

No revision in case of negative example: Why?

Limitations of Find-S

Limitations of Find-S (cont.)

Version Space

The List-Then-Eliminate algorithm

Pros and cons

Version Space: A compact representation

General boundary

Specific boundary

A Version Space

Candidate-Elimination algorithm

Candidate-Elimination algorithm (cont.)

Diapositiva di PowerPoint

What does the Candidate-Elimination algorithm converge to?

Empty Version Space

Other characteristics

How can partially learned concepts be used?

How can partially learned concepts be used?

How can partially learned concepts be used?

How can partially learned concepts be used?

An interactive learning algorithm

Dealing with noisy training instances

Dealing with noisy training instances (cont.)

What if the concept is not contained in the hypothesis space?

A hypothesis space that includes every possible hypothesis?

A fundamental property of inductive inference

Linear Regression

A formal definition of inductive bias

Modeling inductive systems by equivalent deductive systems

Bias of the Candidate-Elimination algorithm

Comparing the inductive bias of learning algorithms

Comparing the inductive bias of learning algorithms

Related work

Related work

Related work

Related work

Related work

Related work

Related work

Related work

Related work

Learning disjunctive concepts: How?

Sequential Covering algorithms

Sequential Covering Algorithm

LEARN-ONE-RULE

Sequential Covering + Candidate Elimination

Sequential Covering + Candidate Elimination

General-to-specific search

The search space for rule preconditions

Beam search

Simultaneous vs. sequential covering algorithm

Simultaneous vs. sequential covering algorithm

Computational complexity

Induce rules directly or convert a decision tree to a set of rules?

Induce rules directly or convert a decision tree to a set of rules?

Replication problem

Single-concept rule learning

Alternatively ...

Changes to Sequential Covering algorithm

Classification of new cases

Default rule

Learning multiple concepts

Multiple-concept learning

Multiple classification

Learning multiple independent concepts

Learning multiple dependent concepts

Learning multiple dependent concepts (cont.)

Learning multiple dependent concepts (cont.)

Learning multiple dependent concepts (cont.)

Learning multiple dependent concepts (cont.)

Related work

Related work

Related work

Related work

Propositional rules

Overview

First-order rules

First-order rules and labeled graphs

Examples

Why do we need first-order representations?

Why do we need first-order representations?

Problems raised by attribute-value representations

Problems raised by attribute-value representations

A first-order representation for examples

A first-order representation for examples

Diapositiva di PowerPoint

First order rules as Prolog clauses

First order rules as SQL queries

When to apply first-order learning algorithms?

Differences between propositional learning and first-order learning

Analogy between propositional and first-order learning systems

Terminology

Terminology (cont.)

Terminology (cont.)

Learning sets of first-order rules: FOIL

The Basic FOIL algorithm

Diapositiva di PowerPoint

A FOIL example

Further details on FOIL

Further details on FOIL

Further details on FOIL

Limitations of FOIL

TILDE: Main characteristics

TILDE (cont.)

First-order decision tree

First-order decision tree

TILDE Method

PROGOL: Main characteristics

PROGOL: an example

INDUBI/CSL:Main characteristics

INDUBI/CSL:Main characteristics

ATRE:Main characteristics

ATRE:Main characteristics

ATRE: Search strategy

Related work

Related work

Related work

Related work

Related work

Overview

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

Applying machine learning to document analysis & recognition: Where & How ?

General comments

Stages of a Machine Learning application

Development stages

Development stages

Development stages

Development stages

Peculiarities of applications to document processing

Peculiarities of applications to document processing

Peculiarities of applications to document processing

Peculiarities of applications to document processing

Peculiarities of applications to document processing

Machine learning for intelligent document processing: the case of WISDOM++

Overview

Document processing steps in WISDOM++

Learning in WISDOM++

WISDOM++: Blocks classification

WISDOM++: Document classification

WISDOM++: Document understanding

Related Work

Related Work

Related Work

Related Work

Related Work

Related Work

Related Work

Related Work

Conclusions