Learning in Document Analysis and Understanding
Overview
Objectives
Learning in Document Processing: some data
Learning in document processing: Some data
Document processing requires a large amount of knowledge
Hand-coding knowledge?
Typical machine learning applications
Comparing development times (Michie, 1989)
Overview
What is learning ?
History
Neural modeling (1955-1965) (1986-…)
Decision-Theoretic Techniques (1955-1965)
Symbolic Concept-oriented Techniques (1962-1980)
Knowledge Intensive Learning Systems and Multistrategy Learning (1980 - today)
The general model of a Learning System
Learning Systems
Example: A handwriting recognition learning problem
Basic Questions
What do Machines learn?
Subsymbolic and Symbolic Learning
Both learning and performance rely on the ability to represent knowledge
Representing experience
Representing experience
Representing experience
Representing experience
Representing the knowledge
Representing the knowledge
Representing the knowledge
Levels of Concept Descriptions
The task
The degree of supervision
The degree of supervision
How do Machine Learn?
How do Machine Learn?
Inferences
Inferences
Inferences
Diapositiva di PowerPoint
The Inductive Paradigm
Empirical Learning(inductively learning from many data)
Empirical Learning (inductively learning from many data)
Example
A small training set
How many hypotheses?
BIAS
How many examples do we need?
The deductive paradigm(explanation based learning)
A multicriteria classification of machine learning methods
Overview
Statistical learning methods
Statistical learning methods
Trainable classifiers
The basic model for a trainable pattern classifier
The basic model for a trainable pattern classifier
The basic model for a trainable pattern classifier
The basic model for a trainable pattern classifier
Discriminant analysis (Fisher 1936)
Diapositiva di PowerPoint
Fisher classification functions
Overview
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
Decision tree learning
From Decision Trees to Decision Rules
Overview
Learning Rules Directly
Concept Learning
A Concept Learning Task
Representing Hypotheses
A Formalization
A Formalization
What information is available?
The inductive learning hypothesis
Concept Learning as Search
An example
Semantically distinct hypotheses
Efficient search: how ?
General-to-specific ordering
Diapositiva di PowerPoint
Terminology
Taking advantage of the general-to-specific ordering
Example
Diapositiva di PowerPoint
FIND-S Algorithm
No revision in case of negative example: Why?
Limitations of Find-S
Limitations of Find-S (cont.)
Version Space
The List-Then-Eliminate algorithm
Pros and cons
Version Space: A compact representation
General boundary
Specific boundary
A Version Space
Candidate-Elimination algorithm
Candidate-Elimination algorithm (cont.)
Diapositiva di PowerPoint
What does the Candidate-Elimination algorithm converge to?
Empty Version Space
Other characteristics
How can partially learned concepts be used?
How can partially learned concepts be used?
How can partially learned concepts be used?
How can partially learned concepts be used?
An interactive learning algorithm
Dealing with noisy training instances
Dealing with noisy training instances (cont.)
What if the concept is not contained in the hypothesis space?
A hypothesis space that includes every possible hypothesis?
A fundamental property of inductive inference
Linear Regression
A formal definition of inductive bias
Modeling inductive systems by equivalent deductive systems
Bias of the Candidate-Elimination algorithm
Comparing the inductive bias of learning algorithms
Comparing the inductive bias of learning algorithms
Related work
Related work
Related work
Related work
Related work
Related work
Related work
Related work
Related work
Learning disjunctive concepts: How?
Sequential Covering algorithms
Sequential Covering Algorithm
LEARN-ONE-RULE
Sequential Covering + Candidate Elimination
Sequential Covering + Candidate Elimination
General-to-specific search
The search space for rule preconditions
Beam search
Simultaneous vs. sequential covering algorithm
Simultaneous vs. sequential covering algorithm
Computational complexity
Induce rules directly or convert a decision tree to a set of rules?
Induce rules directly or convert a decision tree to a set of rules?
Replication problem
Single-concept rule learning
Alternatively ...
Changes to Sequential Covering algorithm
Classification of new cases
Default rule
Learning multiple concepts
Multiple-concept learning
Multiple classification
Learning multiple independent concepts
Learning multiple dependent concepts
Learning multiple dependent concepts (cont.)
Learning multiple dependent concepts (cont.)
Learning multiple dependent concepts (cont.)
Learning multiple dependent concepts (cont.)
Related work
Related work
Related work
Related work
Propositional rules
Overview
First-order rules
First-order rules and labeled graphs
Examples
Why do we need first-order representations?
Why do we need first-order representations?
Problems raised by attribute-value representations
Problems raised by attribute-value representations
A first-order representation for examples
A first-order representation for examples
Diapositiva di PowerPoint
First order rules as Prolog clauses
First order rules as SQL queries
When to apply first-order learning algorithms?
Differences between propositional learning and first-order learning
Analogy between propositional and first-order learning systems
Terminology
Terminology (cont.)
Terminology (cont.)
Learning sets of first-order rules: FOIL
The Basic FOIL algorithm
Diapositiva di PowerPoint
A FOIL example
Further details on FOIL
Further details on FOIL
Further details on FOIL
Limitations of FOIL
TILDE: Main characteristics
TILDE (cont.)
First-order decision tree
First-order decision tree
TILDE Method
PROGOL: Main characteristics
PROGOL: an example
INDUBI/CSL:Main characteristics
INDUBI/CSL:Main characteristics
ATRE:Main characteristics
ATRE:Main characteristics
ATRE: Search strategy
Related work
Related work
Related work
Related work
Related work
Overview
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
Applying machine learning to document analysis & recognition: Where & How ?
General comments
Stages of a Machine Learning application
Development stages
Development stages
Development stages
Development stages
Peculiarities of applications to document processing
Peculiarities of applications to document processing
Peculiarities of applications to document processing
Peculiarities of applications to document processing
Peculiarities of applications to document processing
Machine learning for intelligent document processing: the case of WISDOM++
Overview
Document processing steps in WISDOM++
Learning in WISDOM++
WISDOM++: Blocks classification
WISDOM++: Document classification
WISDOM++: Document understanding
Related Work
Related Work
Related Work
Related Work
Related Work
Related Work
Related Work
Related Work
Conclusions