Allen 1995 : Natural Language Understanding

 

Contents

Preface

Introduction

 

previous chapter

Part I

Syntactic Processing

Part II

Semantic Interpretation

Part III - Context / World Knowledge

next chapter

 

Appendices

Bibliography

 

 

 

Summaries

Further Readings

Exercises

 




Chapter 4: Features and Augmented Grammars



 

4.1 Feature Systems and Augmented Grammars

 

4.2 Some Basic Feature Systems for English

 

4.3 Morphological Analysis and the Lexicon

 

4.4 A Simple Grammar Using Features

 

4.5 Parsing with Features

 

o 4.6 Augmented Transition Networks

 

o 4.7 Definite Clause Grammars

 

o 4.8 Generalized Feature Systems and Unification Grammars

Summary

Related Work and Further Readings

Exercises for Chapter 4





[Allen 1995 : Chapter 4 - Features and Augmented Grammars 83]



Context-free grammars provide the basis for most of the computational parsing mechanisms developed to date, but as they have been described so far, they would be very inconvenient for capturing natural languages. This chapter de­scribes an extension to the basic context-free mechanism that defines constituents by a set of features. This extension allows aspects of natural language such as agreement and subcategorization to be handled in an intuitive and concise way.

Section 4.1 introduces the notion of feature systems and the generalization of context-free grammars to allow features. Section 4.2 then describes some useful feature systems for English that are typical of those in use in various grammars. Section 4.3 explores some issues in defining the lexicon and shows how using features makes the task considerably simpler. Section 4.4 describes a sample context-free grammar using features and introduces some conventions that simplify the process. Section 4.5 describes how to extend a chart parser to handle a grammar with features. The remaining sections, which are optional, describe how features are used in other grammatical formalisms and explore some more advanced material. Section 4.6 introduces augmented transition networks, which are a generalization of recursive transition networks with features, and Section 4.7 describes definite clause grammars based on PROLOG. Section 4.8 describes generalized feature systems and unification grammars.



>> back





4.1 Feature Systems and Augmented Grammars



In natural languages there are often agreement restrictions between words and phrases. For example, the NP "a men" is not correct English because the article a indicates a single object while the noun "men" indicates a plural object; the noun phrase does not satisfy the number agreement restriction of English. There are many other forms of agreement, including subject-verb agreement, gender agree­ment for pronouns, restrictions between the head of a phrase and the form of its complement, and so on. To handle such phenomena conveniently, the grammati­cal formalism is extended to allow constituents to have features. For example, we might define a feature NUMBER that may take a value of either s (for sin­gular) or p (for plural), and we then might write an augmented CFG rule such as

NP -> ART N only when NUMBER1 agrees with NUMBER2

This rule says that a legal noun phrase consists of an article followed by a noun, but only when the number feature of the first word agrees with the number feature of the second. This one rule is equivalent to two CFG rules that would use different terminal symbols for encoding singular and plural forms of all noun phrases, such as

NP-SING -> ART-SING N-SING

NP-PLURAL -> ART-PLURAL N-PLURAL

While the two approaches seem similar in ease-of-use in this one example, consider that all rules in the grammar that use an NP on the right-hand side would



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 84]



now need to be duplicated to include a rule for NP-SING and a rule for NP­-PLURAL, effectively doubling the size of the grammar. And handling additional features, such as person agreement, would double the size of the grammar again and again. Using features, the size of the augmented grammar remains the same as the original one yet accounts for agreement constraints.

To accomplish this, a constituent is defined as a feature structure - a mapping from features to values that defines the relevant properties of the constituent. In the examples in this book, feature names in formulas will be written in boldface. For example, a feature structure for a constituent ART1 that represents a particular use of the word a might be written as follows:



ART1:

(CAT ART

ROOT a

NUMBER s)



This says it is a constituent in the category ART that has as its root the word a and is singular. Usually an abbreviation is used that gives the CAT value more prominence and provides an intuitive tie back to simple context-free grammars. In this abbreviated form, constituent ART1 would be written as



ART1: (ART ROOT a NUMBER s)



Feature structures can be used to represent larger constituents as well. To do this, feature structures themselves can occur as values. Special features based on the integers - 1, 2, 3, and so on - will stand for the first subconstituent, second subconstituent, and so on, as needed. With this, the representation of the NP constituent for the phrase "a fish" could be

NP1:(NP NUMBERs

1 (ART ROOT a

NUMBER s)

2 (N ROOT fish

NUMBER s))

Note that this can also be viewed as a representation of a parse tree shown in Figure 4.1, where the subconstituent features 1 and 2 correspond to the subconstituent links in the tree.

The rules in an augmented grammar are stated in terms of feature structures rather than simple categories. Variables are allowed as feature values so that a rule can apply to a wide range of situations. For example, a rule for simple noun phrases would be as follows:

(NP NUMBER ?n) - (ART NUMBER ?n) (N NUMBER ?n)

This says that an NP constituent can consist of two subconstituents, the first being an ART and the second being an N, in which the NUMBER feature in all three constituents is identical. According to this rule, constituent NP1 given previously is a legal constituent. On the other hand, the constituent



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 85]






Figure 4.1 Viewing a feature structure as an extended parse tree



* (NP 1 (ART NUMBER s)

2 (N NUMBER s))

is not allowed by this rule because there is no NUMBER feature in the NP, and the constituent

*(NP NUMBER s

1 (ART NUMBER s)

2 (N NUMBER p))

is not allowed because the NUMBER feature of the N constituent is not identical to the other two NUMBER features.

Variables are also useful in specifying ambiguity in a constituent. For instance, the word fish is ambiguous between a singular and a plural reading. Thus the word might have two entries in the lexicon that differ only by the value of the NUMBER feature. Alternatively, we could define a single entry that uses a variable as the value of the NUMBER feature, that is,

(N ROOT fish NUMBER ?n)

This works because any value of the NUMBER feature is allowed for the word fish. In many cases, however, not just any value would work, but a range of values is possible.. To handle these cases, we introduce constrained variables, which are variables that can only take a value out of a specified list. For example, the variable ?n{s p} would be a variable that can take the value s or the value p. Typically. when we write such variables, we will drop the variable name alto­gether and just list the possible values. Given this, the word fish might be represented by the constituent

(N ROOT fish NUMBER ?n{sp})

or more simply as

(N ROOT fish NUMBER {s p})



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 86]



BOX 4.1




There is an interesting issue of whether an augmented context-free grammar can describe languages that cannot be described by a simple context-free grammar. The answer depends on the constraints on what can be a feature value. If the set of feature values is finite, then it would always be possible to create new constituent categories for every combination of features. Thus it is expressively equivalent to a context-free grammar. If the set of feature values is unconstrained, however, then such grammars have arbitrary computational power. In practice, even when the set of values is not explicitly restricted, this power is not used, and the standard parsing algorithms can be used on grammars that include features.



>> back



4.2 Some Basic Feature Systems for English



This section describes some basic feature systems that are commonly used in grammars of English and develops the particular set of features used throughout this book. Specifically, it considers number and person agreement, verb form features, and features required to handle subcategorization constraints. You should read this to become familiar with the features in general and then refer back to it later when you need a detailed specification of a particular feature.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 87]



Person and Number Features

In the previous section, you saw the number system in English: Words may be classified as to whether they can describe a single object or multiple objects. While number agreement restrictions occur in several different places in English, jhey are most importantly found in subject-verb agreement. But subjects and verbs must also agree on another dimension, namely with respect to the person. The possible values of this dimension are

First Person (1): The noun phrase refers to the speaker, or a group of people including the speaker (for example, I, we, you, and 0.

Second Person (2): The noun phrase refers to the listener, or a group including the listener but not including the speaker (for example, you, all of you).

Third Person (3): The noun phrase refers to one or more objects, not including the speaker or hearer.

Since number and person features always co-occur, it is convenient to combine the two into a single feature, AGR, that has six possible values: first person singular (is), second person singular (2s), third person singular (3s), and first, second and third person plural (ip, 2p, and 3p, respectively). For example, an instance of the word is can agree only with a third person singular subject, so its AGR feature would be 3s. An instance of the word are, however, may agree with second person singular or any of the plural forms, so its AGR feature would be a variable ranging over the values {2s 1p 2p 3p}.



Verb-Form Features and Verb Subcategorization

Another very important feature system in English involves the form of the verb. This feature is used in many situations, such as the analysis of auxiliaries and generally in the subcategorization restrictions of many head words. As described in Chapter 2, there are five basic forms of verbs. The feature system for verb forms will be slightly more complicated in order to conveniently capture certain phenomena. In particular, we will use the following feature values for the feature VFORM:

base - base form (for example, go, be, say, decide)

pres - simple present tense (for example, go, goes, am, is, say, says, decide)

past - simple past tense (for example, went, was, said, decided)

fin - finite (that is, a tensed form, equivalent to {pres past})

ing - present participle (for example, going, being, saying, deciding)

pastprt - past participle (for example, gone, been, said, decided)

inf - a special feature value that is used for infinitive forms with the word to



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 88]






Figure 4.2 The SUBCAT values for NPNP combinations



To handle the interactions between words and their complements, an addi -tional feature, SUBCAT, is used. Chapter 2 described some common verb sub-categorization possibilities. Each one will correspond to a different value of the SUBCAT feature. Figure 4.2 shows some SUBCAT values for complements consisting of combinations of NPs and VPs. To help you remember the meaning of the feature values, they are formed by listing the main category of each part of the complement. If the category is restricted by a feature value, then the feature value follows the constituent separated by a colon. Thus the value npvp:inf will be used to indicate a complement that consists of an NP followed by a VP with VFORM value inf. Of course, this naming is just a convention to help the reader; you could give these values any arbitrary name, since their significance is determined solely by the grammar rules that involve the feature. For instance, the rule for verbs with a SUBCAT value of _np_vp:inf would be

(VP) -> (V SUBCAT _np_vp:inf)

(NP)

(VP VFORM inf)



This says that a VP can consist of a V with SUBCAT value _np_vp:inf, followed by an NP, followed by a VP with VFORM value inf. Clearly, this rule could be rewritten using any other unique symbol instead of _np_vp:inf, as long as the lexicon is changed to use this new value.

Many verbs have complement structures that require a prepositional phrase with a particular preposition, or one that plays a particular role. For example, the verb give allows a complement consisting of an NP followed by a PP using the preposition to, as in "Jack gave the money to the bank". Qther verbs, such as "put", require a prepositional phrase that describes a location, using prepositions such as "in", "inside", "on", and "by". To express this within the feature system, we introduce a feature PFORM on prepositional phrases. A prepositional phrase with a PFORM value such as TO must have the preposition to as its head, and so on. A prepositional phrase with a PFORM value LOC must describe a location. Another useful PFORM value is MOT, used with verbs such as walk, which may take a



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 89]






Figure 4.3 Some values of the PFORM feature for prepositional phrases






Figure 4.4 Additional SUBCAT values



prepositional phrase that describes some aspect of a path, as in We walked to the store. Prepositions that can create such phrases include to, from, and along. The LOC and MOT values might seem hard to distinguish, as certain prepositions might describe either a location or a path, but they are distinct. For example, while Jack put the box (in on by] the corner is fine, *Jack put the box (to from along] the corner is ill-formed. Figure 4.3 summarizes the PFORM feature.

This feature can be used to restrict the complement forms for various verbs. Using the naming convention discussed previously, the SUBCAT value of a verb such as put would be jip pp:loc, and the appropriate rule in the grammar would be



(VP) -> (V SUBCAT _np_pp:loc)

(NP)

(PP PFORM LOC)



For embedded sentences, a complementizer is often needed and must be subcategorized for. Thus a COMP feature with possible values for, that, and no­comp will be useful. For example, the verb tell can subcategorize for an S that has the complementizer that. Thus one SUBCAT value of tell will be _s:that. Similarly, the verb wish subcategorizes for an S with the complementizer for, as in We wished for the rain to stop. Thus one value of the SUBCAT feature for wish is _s:for. Figure 4.4 lists some of these additional SUBCAT values and examples for a variety of verbs. In this section, all the examples with the



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 90]



SUBCAT feature have involved verbs, but nouns, prepositions, and adjectives may also use the SUBCAT feature and subcategorize for their complements in the same way.



Binary Features

Certain features are binary in that a constituent either has or doesn’t have the feature. In our formalization a binary feature is simply a feature whose value is restricted to be either + or -. For example, the INV feature is a binary feature that indicates whether or not an S structure has an inverted subject (as in a yes/no question). The S structure for the sentence Jack laughed will have an INV value —, whereas the S structure for the sentence Did Jack laugh? will have the INV value +. Often, the value is used as a prefix, and we would say that a structure has the feature +INV or -INV. Other binary features will be introduced as necessary throughout the development of the grammars.



The Default Value for Features

It will be useful on many occasions to allow a default value for features. Any­time a constituent is constructed that could have a feature, but a value is not specified, the feature takes the default value of -. This is especially useful for binary features but is used for nonbinary features as well; this usually ensures that any later agreement check on the feature will fail. The default value is inserted when the constituent is first constructed.



>> back



4.3 Morphological Analysis and the Lexicon



Before you can specify a grammar, you must define the lexicon. This section explores some issues in lexicon design and the need for a morphological analysis component

The lexicon must contain information about all the different words that can be used, including all the relevant feature value restrictions. When a word is ambiguous, it may be described by multiple entries in the lexicon, one for each different use.

Because words tend to follow regular morphological patterns, however, many forms of words need not be explicitly included in the lexicon. Most English verbs, for example, use the same set of suffixes to indicate different forms: -s is added for third person singular present tense, -ed for past tense, -ing for the present participle, and so on. Without any morphological analysis, the lexicon would have to contain every one of these forms. For the verb want this would require six entries, for want (both in base and present form), wants, wanting, and wanted (both in past and past participle form).

In contrast, by using the methods described in Section 3.7 to strip suffixes there needs to be only one entry for want. The idea is to store the base form of the



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 91]



verb in the lexicon and use context-free rules to combine verbs with suffixes to derive the other entries. Consider the following rule for present tense verbs:

(V ROOT ?r SUBCAT ?s VFORM pres AGR 3s) ->

(V ROOT ?r SUBCAT ?s VFORM base) (+S)

where +S is a new lexical category that contains only the suffix morpheme -s. This rule, coupled with the lexicon entry

want:

(V ROOT want

SUBCAT {_np_vp:inf _np_vp:inf}

VFORM base)

would produce the following constituent given the input string want -s

want:

(V ROOT want

SUBCAT {_np_vp:inf _np_vp:inf}

VFORM pres

AGR 3s)

Another rule would generate the constituents for the present tense form not in third person singular, which for most verbs is identical to the root form:

(V ROOT ?r SUBCAT ?s VFORM pres AGR {ls 2s lp 2p 3p})

—> (V ROOT ?r SUBCAT ?s VFORM base)

But this rule needs to be modified in order to avoid generating erroneous interpretations. Currently, it can transform any base form verb into a present tense form, which is clearly wrong for some irregular verbs. For instance, the base form be cannot be used as a present form (for example, *We be at the store). To cover these cases, a feature is introduced to identify irregular forms. Specifically, verbs with the binary feature +IRREG-PRES have irregular present tense forms. Now the rule above can be stated correctly:

(V ROOT ?r SUBCAT ?s VFORM pres AGR (ls 2s lp 2p 3p))

—> (V ROOT ?r SUBCAT ?s VFORM base IRREG-PRES -)



Because of the default mechanism, the IRREG-PRES feature need only be specified on the irregular verbs. The regular verbs default to -, as desired. Similar binary features would be needed to flag irregular past forms (IRREG-PAST, such as saw), and to distinguish -en past participles from -ed past participles (EN­-PASTPRT). These features restrict the application of the standard lexical rules, and the irregular forms are added explicitly to the lexicon. Grammar 4.5 gives a set of rules for deriving different verb and noun forms using these features.

Given a large set of features, the task of writing lexical entries appears very difficult. Most frameworks allow some mechanisms that help alleviate these problems. The first technique - allowing default values for features - has already been mentioned. With this capability, if an entry takes a default value for a given feature, then it need not be explicitly stated. Another commonly used technique is



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 92]






Grammar 4.5 Some lexical rules for common suffixes on verbs and nouns



to allow the lexicon writing to define clusters of features, and then indicate a cluster with a single symbol rather than listing them all. Later, additional techniques will be discussed that allow the inheritance of features in a feature hierarchy.

Figure 4.6 contains a small lexicon. It contains many of the words to be used in the examples that follow. It contains three entries for the word "saw "- as a noun, as a regular verb, and as the irregular past tense form of the verb "see" - as illustrated in the sentences

The saw was broken.

Jack wanted me to saw the board in half.

I saw Jack eat the pizza.

With an algorithm for stripping the suffixes and regularizing the spelling, as described in Section 3.7, the derived entries can be generated using any of the basic parsing algorithms on Grammar 4.5. With the lexicon in Figure 4.6 and Grammar 4.5, correct constituents for the following words can be derived: been, being, cries, cried, crying, dogs, saws (two interpretations), sawed, sawing, seen, seeing, seeds, wants, wanting, and wanted. For example, the word cries would be transformed into the sequence cry +s, and then rule 1 would produce the present tense entry from the base form in the lexicon.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 93]






Figure 4.6 A lexicon



Often a word will have multiple interpretations that use different entries and different lexical rules. The word saws, for instance, transformed into the sequence saw +s, can be a plural noun (via rule 7 and the first entry for saw), or the third person present form of the verb saw (via rule 1 and the second entry for saw). Note that rule I cannot apply to the third entry, as its VFORM is not base.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 94]



The success of this approach depends on being able to prohibit erroneous derivations, such as analyzing seed as the past tense of the verb "see". This analysis will never be considered if the FST that strips suffixes is correctly designed. Specifically, the word see will not allow a transition to the states that allow the -ed suffix. But even if this were produced for some reason, the IRREG-PAST value + in the entry for see would prohibit rule 3 from applying.



>> back



4.4 A Simple Grammar Using Features



This section presents a simple grammar using the feature systems and lexicon developed in the earlier sections. It will handle sentences such as the following:

The man cries.

The men cry.

The man saw the dogs.

He wants the dog.

He wants to be happy.

He wants the man to see the dog.

He is happy to be a dog.

It does not find the following acceptable:

* The men cries.

* The man cry.

* The man saw to be happy.

* He wants.

* He wants the man saw the dog.

Before developing the grammar, some additional conventions are intro­duced that will be very useful throughout the book. It is very cumbersome to write grammatical rules that include all the necessary features. But there are certain regularities in the use of features that can be exploited to simplify the process of writing ‘rules. For instance, many feature values are unique to a feature (for example, the value inf can only appear in the VFORM feature, and _np_vp:inf can only appear in the SUBCAT feature). Because of this, we can omit the feature name without introducing any ambiguity. Unique feature values will be listed using square parentheses. Thus (VP SUBCAT inf) will be abbreviated as VP[inf]. Since binary features do not have unique values, a special convention is introduced for them. For a binary feature B, the constituent C[+B] indicates the constituent (C B +).

Many features are constrained so that the value on the mother must be identical to the value on its head subconstituent. These are called head features. For instance, in all VP rules the VFORM and AGR values are the same in the VP and the head verb, as in the rule



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 95]



BOX 4.2 Systemic Grammar

An important influence on the development of computational feature-based systems was systemic grammar (Halliday, 1985). This theory emphasizes the functional role of linguistic constructs as they affect communication. The grammar is organized as a set of choices about discourse function that determine the structure of the sentence. The choices are organized into hierarchical structures called systems. For example, the mood system would capture all the choices that affect the mood of the sentence. Part of this structure looks as follows:




This structure indicates that once certain choices are made, others become relevant. For instance, if you decide that a sentence is in the declarative mood, then the choice between bound and relative becomes relevant. The choice between yes/no and wh, on the other hand, is not relevant to a declarative sentence.

Systemic grammar was used in Winograd (1973), and Winograd (1983) contains a good discussion of the formalism. In recent years it has mainly been used in natural language generation systems because it provides a good formalism for organizing the choices that need to be made while planning a sentence (for example, see Mann and Mathiesson (1985) and Patten (1988)).



(VP VFORM ?v AGR ?a) ->

(V VFORM ?v AGR ?a SUBCAT _np_vp:inf)

(NP)

(VP VFORM inf)

If the head features can be declared separately from the rules, the system can automatically add these features to the rules as needed. With VFORM and AGR declared as head features, the previous VP rule can be abbreviated as

VP -> (V SUBCAT _np_vp:inf) NP (VP VFORM inf)

The head constituent in a rule will be indicated in italics. Combining all the abbreviation conventions, the rule could be further simplified to

VP —> V [_vp_vp:inf] NP VP[inf]

A simple grammar using these conventions is shown as Grammar 4.7. Except for rules 1 and 2, which must enforce number agreement, all the rest of the feature constraints can be captured using the conventions that have been



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 96]






Grammar 4.7 A simple grammar in the abbrveiated form








Grammar 4.8 The expanded grammar showing all features



introduced. The head features for each category are declared at the bottom of the figure. This grammar is an abbreviation of Grammar 4.8. Consider how rule 1 in Grammar 4.7 abbreviates rule 1 in Grammar 4.8. The abbreviated rule is



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 97]






Figure 4.9 Two sample parse trees with feature values



S[-inv] —> (NP AGR ?a) (VP [{pres past}] AGR ?a)

The unique values can be expanded in the obvious way: the value [-inv] becomes (INV -) and the value [{pres past}] becomes (VFORM ?v{pres past}). The head features for S are AGR and VFORM, so these features must be added to the S and VP head. The resulting rule is

(S INV — VFORM ?v{pres past} AGR ?a) —>

(NP AGR ?a)

(VP VFORM ?v {pres past} AGR ?a)

as shown in Grammar 4.8.

The abbreviated form is also very useful for summarizing parse trees. For instance, Figure 4.9 shows the parse trees for two of the previous sample sentences, demonstrating that each is an acceptable sentence.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 98]



Consider why each of the ill-formed sentences introduced at the beginning of this section are not accepted by Grammar 4.7. Both *"The men cries" and *"The man cry" are not acceptable because the number agreement restriction on rule 1 is not satisfied: The NP constituent for the men has the AGR value 3p, while the VP cries has the AGR value 3s. Thus rule 1 cannot apply. Similarly, "the man cry" is not accepted by the grammar since the man has AGR 3s and the VP cry has as its AGR value a variable ranging over (1s 2s 1p 2p 3p). The phrase *"the man saw to be happy" is not accepted because the verb saw has a SUBCAT value _np. Thus only rule 5 could be used to build a VP. But rule 5 requires an NP complement, and it is not possible for the words "to be happy" to be a legal NP.

The phrase "He wants" is not accepted since the verb wants has a SUBCAT value ranging over {_np_vp:inf _np_vp:inf}, and thus only rules 5, 6, and 7 could apply to build a VP. But all these rules require a nonempty complement of some kind. The phrase *"He wants the man saw the dog" is not accepted for similar reasons, but this requires a little more analysis. Again, rules 5, 6, and 7 are possible with the verb wants. Rules 5 and 6 will not work, but rule 7 looks close, as it requires an NP and a VP[inf]. The phrase "the man" gives us the required NP, but "saw the dog" fails to be a VP[inf]. In particular, "saw the man" is a legal VP, but its VFORM feature will be past, not inf.



>> back





4.5 Parsing with Features



The parsing algorithms developed in Chapter 3 for context-free grammars can be extended to handle augmented context-free grammars. This involves generalizing the algorithm for matching rules to constituents. For instance, the chart-parsing algorithms developed in Chapter 3 all used an operation for extending active arcs with a new constituent. A constituent X could extend an arc of the form

C -> C1 ... Ci o X ... Cn

to produce a new arc of the form

C -> C1 ... Ci X o ... Cn

A similar operation can be used for grammars with features, but the parser may have to instantiate variables in the original arc before it can be extended by X. The key to defining this matching operation precisely is to remember the definition of grammar rules with features. A rule such as

1. (NP AGR ?a) -> o (ART AGR ?a) (N AGR ?a)

says that an NP can be constructed out of an ART and an N if all three agree on the AGR feature. It does not place any restrictions on any other features that the NP, ART, or N may have. Thus, when matching constituents against this rule, the only thing that matters is the AGR feature. All other features in the constituent can be ignored. For instance, consider extending arc 1 with the constituent

2. (ART ROOT A AGR 3s)



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 99]



To make arc 1 applicable, the variable ?a must be instantiated to 3s, producing

3. (NP AGR 3s) -> o (ART AGR 3s) (N AGR 3s)

This arc can now be extended because every feature in the rule is in constituent 2:

4. (NP AGR 3s) -> (ART AGR 3s) o (N AGR 3s)

Now, consider extending this arc with the constituent for the word dog:

5. (N ROOT DOG1 AGR 3s)

This can be done because the AGR features agree. This completes the arc

6. (NP AGR 3s) —> (ART AGR 3s) (N AGR 3s)

This means the parser has found a constituent of the form (NP AGR 3s).

This algorithm can be specified more precisely as follows: Given an arc A, where the constituent following the dot is called NEXT, and a new constituent X, which is being used to extend the arc,

a. Find an instantiation of the variables such that all the features specified in NEXT are found in X.

b. Create a new arc A', which is a copy of A except for the instantiations of the variables determined in step (a).

c. Update A' as usual in a chart parser.

For instance, let A be arc 1, and X be the ART constituent 2. Then NEXT will be (ART AGR ?a). In step a, NEXT is matched against X, and you find that ?a must be instantiated to 3s. In step b, a new copy of A is made, which is shown as arc 3. In step c, the arc is updated to produce the new arc shown as arc 4.

When constrained variables, such as ?a{3s 3p}, are involved, the matching proceeds in the same manner, but the variable binding must be one of the listed values. If a variable is used in a constituent, then one of its possible values must match the requirement in the rule. If both the rule and the constituent contain variables, the result is a variable ranging over the intersection of their allowed values. For instance, consider extending arc 1 with the constituent (ART ROOT the AGR ?v{3s 3p}), that is, the word "the". To apply, the variable ?a would have to be instantiated to ?v{ 3s 3p}, producing the rule

(NP AGR ?v{3s 3p}) —> (ART AGR ?v{3s 3p}) o (N AGR ?v{3s 3p})

This arc could be extended by (N ROOT dog AGR 3s), because ?v{3s 3p} could be instantiated by the value 3s. The resulting arc would be identical to arc 6. The entry in the chart for the is not èhanged by this operation. It still has the value ?v{3s 3p}. The AGR feature is restricted to 3s only in the arc.

Another extension is useful for recording the structure of the parse. Subconstituent features (1, 2, and so on, depending on which subconstituent is being added) are automatically inserted by the parser each time an arc is extended. The values of these features name subconstituents already in the chart.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 100]






Figure 4.10 The chart for "He wants to cry".



With this treatment, and assuming that the chart already contains two consti­tuents, ARTL and Ni, for the words the and dog, the constituent added to the chart for the phrase the dog would be

(NP AGR 3s

1 ART1

2 N1)

where ART1 (ART ROOT the AGR {3s 3p}) and N1 = (N ROOT dog AGR {3s}). Note that the AGR feature of ART1 was not changed. Thus it could be used with other interpretations that require the value 3p if they are possible. Any of the chart-parsing algorithms described in Chapter 3 can now be used with an augmented granunar by using these extensions to extend arcs and build consti -tuents. Consider an example. Figure 4.10 contains the final chart produced from parsing the sentence He wants to cry using Grammar 4.8. The rest of this section considers how some of the nontenninal symbols were constructed for the chart.

Constituent NP1 was constructed by rule 3, repeated here for convenience:



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 101]



3. (NP AGR ?a) -> (PRO AGR ?a)

To match the constituent PRO1, the variable ?a must be instantiated to 3s. Thus the new constituent built is

NP1: (CAT NP

AGR 3s

1 PRO1)

Next consider constructing constituent VP1 using rule 4, namely

4. (VP AGR ?a VFORM ?v) -> (V SUBCAT _none AGR ?a VFORM ?v)

For the right-hand side to match constituent V2, the variable ?v must be instantiated to base. The AGR feature of V2 is not defined, so it defaults to -. The new constituent is

VP1: (CAT VP

AGR -

VFORM base

1 V2)

Generally, default values are not shown in the chart. In a similar way, constituent VP2 is built from TO1 and VP1 using rule 9, VP3 is built from V1 and VP2 using rule 6, and S1 is built from NP1 and VP3 using rule 1.



>> back



4.6 Augmented Transition Networks

Features can also be added to a recursive transition network to produce an aug­mented transition network (ATN). Features in an ATN are traditionally called registers. Constituent structures are created by allowing each network to have a set of registers. Each time a new network is pushed, a new set of registers is created. As the network is traversed, these registers are set to values by actions associated with each arc. When the network is popped, the registers are assem­bled to form a constituent structure, with the CAT slot being the network name.

Grammar 4.11 is a simple NP network. The actions are listed in the table below the network. ATNs use a special mechanism to extract the result of following an arc. When a lexical arc, such as arc 1, is followed, the constituent built from the word in the input is put into a special variable named "*". The action

DET := *

then assigns this constituent to the DET register. The second action on this arc,

AGR := AGR*

assigns the AGR register of the network to the value of the AGR register of the new word (the constituent in "*").

Agreement checks are specified in the tests. A test is an expression that succeeds if it returns a nonempty value and fails if it returns the empty set or nil.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 102]






Grammar 4.11 A simple NP network






Grammar 4.12 A simple S network



If a test fails, its arc is not traversed. The test on arc 2 indicates that the arc can be followed only if the AGR feature of the network has a non-null intersection with the AGR register of the new word (the noun constituent in "*").

Features on push arcs are treated similarly. The constituent built by traversing the NP network is returned as the value "*". Thus in Grammar 4.12, the action on the arc from S to S1,

SUBJ := *

would assign the constituent returned by the NP network to the register SUBJ. The test on arc 2 will succeed only if the AGR register of the constituent in the SUBJ register has a non-null intersection with the AGR register of the new constituent (the verb). This test enforces subject-verb agreement.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 103]






Figure 4.13 Trace tests and actions used with "1 The 2 dog 3 saw 4 Jack 5"



With the lexicon in Section 4.3, the ATN accepts the following sentences:

The dog cried.

The dogs saw Jack.

Jack saw the dogs.

Consider an example. A trace of a parse of the sentence "The dog saw Jack" is shown in Figure 4.13. It indicates the current node in the network, the current



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 104]






Grammar 4.14 An S network for assertions



word position, the arc that is followed from the node, and the register manipulations that are performed for the successful parse. It starts in the S network but moves immediately to the NP network from the call on arc 4. The NP network checks for number agreement as it accepts the word sequence The dog. It constructs a noun phrase with the AGR feature plural. When the pop arc is followed, it completes arc 4 in the S network. The NP is assigned to the SUBJ register and then checked for agreement with the verb when arc 3 is followed. The NP "Jack" is accepted in another call to the NP network.



An ATN Grammar for Simple Declarative Sentences

Here is a more comprehensive example of the use of an ATN to describe some declarative sentences. The allowed sentence structure is an initial NP followed by a main verb, which may then be followed by a maximum of two NPs and many PPs, depending on the Verb. Using the feature system extensively, you can create a grammar that accepts any of the preceding complement forms, leaving the actual verb-complement agreement to the feature restrictions. Grammar 4.14 shows the S network. Arcs are numbered using the conventions discussed in Chapter 3. For instance, the arc S3/1 is the arc labeled 1 leaving node 83. The NP network in Grammar 4.15 allows simple names, bare plural nouns, pronouns, and a simple sequence of a determiner followed by an adjective and a head noun.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 105]






Grammar 4.15 The NP network



Allowable noun complements include an optional number of prepositional phrases. The prepositional phrase network in Grammar 4.16 is straightforward. Examples of parsing sentences with this grammar are left for the exercises.



Presetting Registers

One further extension to the feature-manipulation facilities in ATNs involves the ability to preset registers in a network as that network is being called, much like parameter passing in a programming language. This facility, called the SENDR action in the original ATN systems, is useful to pass information to the network that aids in analyzing the new constituent.

Consider the class of verbs, including want and pray, that accept complements using the infinitive forms of verbs, which are introduced by the word to. According to the classification in Section 4.2, this includes the following:



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 106]






Grammar 4.16 The PP network



 

_vp:inf

Mary wants to have a party.

 

_np_vp:inf

Mary wants John to have a party.

In the context-free grammar developed earlier, such complements were treated as VPs with the VFORM value inf. To capture this same analysis in an ATN, you would need to be able to call a network corresponding to VPs but preset the VFORM register in that network to inf. Another common analysis of these constructs is to view the complements as a special form of sentence with an understood subject. In the first case it is Mary who would be the understood subject (that is, the host), while in the other case it is John. To capture this analysis, many ATN grammars preset the SUBJ register in the new S network when it is called.



>> back



o 4.7 Definite Clause Grammars



You can augment a logic grammar by adding extra arguments to each predicate to encode features. As a very simple example, you could modify the PROLOG rules to enforce number agreement by adding an extra argument for the number on every predicate for which the number feature is relevant. Thus you would have rules such as those shown in Grammar 4.17.

Consider parsing the noun phrase "the dog cried", which would be captured by the assertions

word(the, 1, 2):-

word(dog, 2, 3):-

With these axioms, when the word the is parsed by rule 2 in Grammar 4.17, the number feature for the word is returned. You can see this in the following trace of the simple proof of

np(1, Number, 3)

Using rule 1, you have the following subgoals:



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 107]






Grammar 4.17






Grammar 4.18



art(1, Number, P2)

n(P2, Number, 3)

The first subgoal succeeds by using rule 2 and proving

word(the, 1, 2)

isart(the, 3s)

which binds the variables in rule 2 as follows:

Number <- 3s

P2 <- 2

Thus, the second subgoal now is

n(2, 3s, 3)

Using rule 6, this reduces to the subgoals word(Word, 2, 3) and isnoun(Word, 3s), which are established by the input and rule 7, respectively, with Word bound to dog. Thus the parse succeeds and the number agreement was enforced.

The grammar can also be extended to record the structure of the parse by adding another argument to the rules. For example, to construct a parse tree, you could use the rules shown in Grammar 4.18. These rules would allow you to prove the following on the sentence "The dog cried":



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 108]






Figure 4.19 A tree representation of the structure



S(1, 3s, s(np(art(the), n(dog)), vp(v(cried))), 4)

In other words, between positions 1 and 4 there is a sentence with number feature 3s and the structure

s(np(art(the), n(dog)), vp(v(cried)))

which is a representation of the parse tree shown in Figure 4.19.

For specifying grammars, most logic-based grammar systems provide a more convenient format that is automatically converted into PROLOG clauses like those in Grammar 4.18. Since the word position arguments are on every predi­cate, they can be omitted and inserted by the system. Similarly, since all the predicates representing terminal symbols (for example, art, N, V) are defined systematically, the system can generate these rules automatically. These abbre­viations are encoded in a format called definite clause grammars (DCGs).

For example, you could write Grammar 4.18 by specifying just three DCG rules. (The symbol "->" is traditionally used to signal DCG rules.)

s (Number, s(Np, Vp)) -> np(N1, Np), vp(N2, Vp)

np (Number, np(Art, N)) -> art(N1, Art), n(N2, N)

vp (Number, vp( Verb)) -> v(Number, Verb)

The result is a formalism similar to the augmented context-free grammars described in Section 4.3. This similarity can be made even closer if a feature/value representation is used to represent structure. For instance, let us replace all the argument positions with a single argument that is a feature structure. The DCG equivalent to the first five rules in Grammar 4.7 would be as shown in Grammar 4.20, where square parentheses indicate lists in PROLOG. Notice that rule 1 will not succeed unless the 5, NP, and VP all agree on the agr feature, because they have the same variable as a value.

Of course, this encoding will only work if every constituent specifies its feature values in the same order so that the feature lists unify. For example, if the



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 109]






Grammar 4.20 A DCG version of Grammar 4.7



S features consist of "inv", "agr", "inv", and "vform", every rule using an S must specify all these features in the same order. In a realistic grammar this would be awkward, because there will be many possible features for each constituent, and each one must be listed in every occurrence. One way to solve this problem is to extend the program that converts DCG rules to PROLOG rules so that it adds in any unspecified features with variable values. This way the user need only specify the features that are important.



>> back



o 4.8 Generalized Feature Systems and Unification Grammars



You have seen that feature structures are very useful for generalizing the notion of context-free grammars and transition networks. In fact, feature structures can be generalized to the extent that they make the context-free grammar unnec -essary. The entire grammar can be specified as a set of constraints between feature structures. Such systems are often called unification grammars. This section provides an introduction to the basic issues underlying such formalisms.

The key concept of a unification grammar is the extension relation between two feature structures. A feature structure Fl extends, or is more specific than, a feature structure F2 if every feature value in Fl is specified in F2. For example, the feature structure

(CAT V

ROOT cry)

extends the feature structure (CAT V), as its CAT value is V as required, and the ROOT feature is unconstrained in the latter feature structure. On the other hand, neither of the feature structures

 

(CAT V

(CAT V

 

ROOT cry)

VFORM pres)

extend the other, because both lack information required by the other. In particular, the first lacks the VFORM feature required by the second, and the second lacks the ROOT feature required by the first.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 110]



Two feature structures unify if there is a feature structure that is an extension of both. The most general unifier is the minimal feature structure that is an extension of both. The most general unifier of the above two feature structures is

(CAT V

ROOT cry

VFORM pres)

Note that this is an extension of both original feature structures, and there is no smaller feature structure that is an extension of both. In contrast, the structures

 

(CAT V

(CAT V

 

AGR 3s)

AGR 3p)

do not unify. There can be no FS that is an extension of both because they specify contradictory AGR feature values.

This formalism can be extended to allow simple disjunctive values (for example, {3s 3p}) in the natural way. For example, (AGR 3s) extends (AGR {3s 3p}).

The notion of unification is all that is needed to specify a grammar, as all feature agreement checks and manipulations can be specified in terms of unification relationships. A rule such as S —* NP VP in Grammar 4.7 could be expressed in a unification grammar as

 

 

 

 

X0 -> X1 X2

CAT0 = S

 

 

CAT1 = NP

 

 

CAT2 = VP

 

 

AGR0 = AGR1 = AGR2

 

 

VFORM0 = VFORM2



This says that a constituent X0 can be constructed out of a sequence of constituents XI and X2 if the CAT of XO is 5, the CAT of Xl is NP, the CAT of X2 is VP, the AGR values of all three constituents are identical, and the VFORM values of constituents X0 and X2 are identical. If the CAT value is always specified, such rules can be abbreviated as follows:

 

S -> NP VP

AGR = AGR1 = AGR2

 

 

VFORM = VFORM2

where the CAT values are used in the rule. Also, the 0 subscript is omitted. Using these abbreviations, a subpart of Grammar 4.7 is rewritten as the unification grammar in Grammar 4.21. Since such grammars retain the structure of context-free rules, the standard parsing algorithms can be used on unification grammars. The next section shows how to interpret the feature equations to build the constituents.

[Allen 1995 : Chapter 4 - Features and Augmented Grammars 111]






Grammar 4.21 unification grammar






Figure 4.22 Two noun phrase DAGs



Formal Development: Feature Structures as DAGs

The unification-based formalism can be defined precisely by representing feature structuies as directed acyclic graphs (DAGs). Each constituent and value is represented as a node, and the features are represented as labeled arcs. Repre -sentations of the following two constituents are shown in Figure 4.22:

 

N1:

(CAT N

N2:

(CAT N

 

 

ROOT fish

 

AGR 3s)

 

 

AGR {3s 3p})

 

 

The sources of a DAG are the nodes that have no incoming edges. Feature structure DAGs have a unique source node, called the root node. The DAG is said to be rooted by this node. The sinks of a DAG are nodes with no outgoing edges. The sinks of feature structures are labeled with an atomic feature or set of features (for example, { 3s 3p}).

The unification of two feature structures is defined in terms of a graph -matching algorithm. This takes two rooted graphs and returns a new graph that is



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 112]



To unify a DAG rooted at node Ni with a DAG rooted at node Nj:

1. If Ni equals Nj, then return Ni and succeed.

2. If both Ni and Nj are sink nodes, then if their labels have a non-null intersection, return a new node with the intersection as its label. Otherwise, the DAGs do not unify.

3. If Ni and Nj are not sinks, then create a new node N. For each arc labeled F leaving Ni to node NFi,

3a. If there is an arc labeled F leaving Ni to node NFi, then recursively unify NFi and NFj. Build an arc labeled F from N to the result of the recursive call.

3b. If there is no arc labeled F from Ni, build an arc labeled F from N to NFi.

3c. For each arc labeled F from Nj to node NF, where there is no F arc leaving Nj, create a new arc labeled F from N to NFj.

Figure 4.23 The graph unification algorithm



the unification of the two. The algorithm is shown in Figure 4.23. The result of applying this algorithm to nodes Ni and N2 in Figure 4.22 is the new constituent

N3: (CATN

ROOT fish

AGR 3s)

You should trace the algorithm on this example by hand to see how it works. This is a very simple case, as there is only one level of recursion. The initial call with the nodes N1 and N2 is handled in step 3, and each recursive call simply involves matching of sink nodes in step 2.

With this algorithm in hand, the algorithm for constructing a new constituent using the graph unification equations can be described. Once this is developed, you can build a parser using any standard parsing algorithm. The algorithm to build a new constituent of category C using a rule with feature equations of form Fi = V, where Fi indicates the F feature of the i'th subconstituent, is shown as Figure 4.24.

Consider an example. Assume the following two constituents shown in Figure 4.25 are defined already. In LISP notation, they are

 

ART1: (CAT

N1: (CATN

 

ROOT the

ROOT fish

 

AGR {3s 3p})

AGRE {3s 3p})



The new NP will be built using rule 2 in Grammar 4.21. The equations are

CAT0 = NP

CAT1 = ART

CAT2 = N

AGR = AGR1 = AGR2

The algorithm produces the constituent represented by the DAG in Figure 4.26.





[Allen 1995 : Chapter 4 - Features and Augmented Grammars 113]




Figure 4.24 An algorithm to build a new constituent using feature equations






Figure 4.25 Lexical entries for the and fish






Figure 4.26 The graph for the NP the fish



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 114]






Figure 4.27 The analysis of the VP is happy



Continuing the example, assume that the VP is happy is analyzed similarly and is represented as in Figure 4.27. To simplify the graph, the CAT arcs are not shown. Figure 4.28 shows the analysis of The fish is happy constructed from rule 1 in Grammar 4.21. The value of the AGR slot is now the same node for S1, NF1, ART1, N1, VP1, and V1. Thus the value of the AGR feature of CCL, for instance, changed when the AGR features of NP1 and VP1 were unified.

So far, you have seen unification grammars used only to mimic the behavior of the augmented context-free grammars developed earlier. But the unification framework is considerably richer because there is no requirement that rules be based on syntactic categories. For example, there is a class of phrases in English called predicative phrases, which can occur in sentences of the form

NP be _

This includes prepositional phrases (He is in the house), noun phrases (He is a traitor), and adjective phrases (He is happy). Grammars often include a binary feature, say PRED, that is true of phrases that can be used in this way. In a standard CFG you would need to specify a different rule for each category, as in

VP -> (V ROOT be)(NP PRED +)

VP -> (V ROOT be) (PP PRED +)

VP -> (V ROOT be) (ADJP PRED +)

With the unification grammar framework, one rule handles all the categories, namely

 

X0 -> X1 X2

CAT0 = VP

 

 

CAT1 = V

 

 

ROOT1 = be

 

 

PRED2 = +

in which any constituent X2 with the +PRED feature is allowed.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 115]




Figure 4.28 An analysis of The fish is happy.



Of course, if the categories of constituents are not specified, it is not so clear how to adapt the standard CFG parsing algorithms to unification grammar. It can be shown that as long as there is a finite subset of features such that at least one of the features is specified in every constituent, the unification grammar can be converted into a context-free grammar. This set of features is sometimes called the context-free backbone of the grammar.

Another powerful feature of unification grammars is that it allows much more information to be encoded in the lexicon. In fact, almost the entire grammar can be encoded in the lexicon, leaving very few rules in the grammar. Consider all the rules that were introduced earlier to deal with different verb subcate­gorizations. These could be condensed to a few rules: one for verbs that sub-categorize for one subconstituent, another for verbs that subcategorize for two, and so on. The actual category restrictions on the complement structure would be encoded in the lexicon entry for the verb. For example, the verb put might have a lexicon entry as follows:

put: (CAT V SUBCAT (FIRST (CAT NP) SECOND (CAT PP LOC +)))



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 116]



The general rule for verbs that subcategorize for two constituents would be

 

VP -> V X2 X3

2 = FIRSTSUBCAT1

 

 

3 = SECONDSUBCAT1

This rule would accept any sequence V X2 X3, in which X2 unifies with the FIRST feature of the SUBCAT feature of X1 and X3 unifies with the SECOND feature of the SUBCAT feature of X1. If V is the verb put, this rule would require X2 to unify with (CAT NP) and X3 to unify with (CAT PP LOC +), as desired. Of course, this same rule would put completely different constraints on X2 and X3 for a verb like want, which would require X3 to unify with (CAT VP VFORM inf).

Such techniques can dramatically reduce the size of the grammar. For instance, a reasonable grammar would require at least 40 rules to handle verb subcategorizations. But no rule involves more than three subconstituents in the verb complement. Thus these 40 rules could be reduced to four in the unification grammar, including a rule for null complements. In addition, these same rules could be reused to handle all nouns and adjective subcategorizations, if you generalize the category restrictions on X1.



>> back



Summary



This chapter has extended the grammatical formalisms introduced in Chapter 3 by adding features to each constituent and augmenting the grammatical rules. Features are essential to enable the construction of wide-coverage grammars of natural language. A useful set of features for English was defined, and morpho­logical analysis techniques for mapping words to feature structures (that is, constituents) were developed. Several different forms of augmentation were examined. The first allowed context-free rules to specify feature agreement restrictions in addition to the basic category, producing augmented context-free grammars. The standard parsing algorithms described in Chapter 3 can be extended to handle such grammars. The second technique produced augmented transition networks, in which different procedural tests and actions could be defined on each arc to manipulate feature structures. The final method was based on the unification of feature structures and was used to develop unification grammars.



>> back



Related Work and Further Readings



Augmented CFGs have been used in computational models since the introduction

of attribute grammars by Knuth (1968), who employed them for parsing programming languages. Since then many systems have utilized annotated rules of some form. Many early systems relied on arbitrary LISP code for annotations, although the types of operations commonly implemented were simple feature



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 117]



BOX 4.3 Lexical Functional Grammar






tests and structure building along the lines of those discussed here. Examples of such systems are Sager (1981) and Robinson (1982).

The particular features used in this chapter and the techniques for augment­ing rules are loosely based on work in the generalized phrase structure grammar (GPSG) tradition (Gazdar, Klein, Pullum, and Sag, 1985). GPSG introduces a finite set of feature values that can be attached to any grammatical symbol. These play much the same role as the annotations developed in this book except that there are n.o feature names. Rather since all values are unique, they define both



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 118]



the feature type and the value simultaneously. For example, a plural noun phrase in the third person with gender female would be of the grammatical type

NP[PL,3,F]

Since the number of feature values is finite, the grammar is formally equivalent to a context-free grammar with a symbol for every combination of categories and features. One of the important contributions of GPSG, however, is the rich structure it imposes on the propagation of features. Rather than using explicit feature equation rules, GPSG relies on general principles of feature prop­agation that apply to all rules in the grammar. A good example of such a general principle is the head feature convention, which states that all the head features on the parent constituent must be identical to its head constituent. Another general principle enforces agreement restrictions between constituents. Some of the conventions introduced in this chapter to reduce the number of feature equa­tions that must be defined by hand for each rule are motivated by these theo­retical claims. An excellent survey of GPSG and LFG is found in Sells (1985).

The ATN framework described here is drawn from the work described in Woods (1970; 1973) and Kaplan (1973). A good survey of ATNs is Bates (1978). Logic programming approaches to natural language parsing originated in the early i970s with work by Colmerauer (1978). This approach is perhaps best described in a paper by Pereira and Warren (1980) and in Pereira and Shieber (1987). Other interesting developments can be found in McCord (1980) and Pereira (1981). A good recent example is Alshawi (1992).

The discussion of unification grammars is based loosely on work by Kay (1982) and the PATR-II system (Shieber, 1984; 1986). There is a considerable amount of active research in this area. An excellent survey of the area is found in Shieber (1986). There is also a growing body of work on formalizing different forms of feature structures. Good examples are Rounds (1988), Shieber (1992), and Johnson (1991).



>> back



Exercises for Chapter 4



1. (easy) Using Grammar 4.7, draw the complete charts resulting from parsing the two sentences The man cries and He wants to be happy, whose final analyses are shown in Figure 4.9. You may use any parsing algorithm you want, but make sure that you state which one you are using and that the final chart contains every completed constituent built during the parse.

2. (medium) Define the minimal set of lexicon entries for the following verbs so that, using the morphological analysis algorithm, all standard forms of the verb are recognized and no illegal forms are inadvertently produced. Discuss any problems that arise and assumptions you make, and suggest modifications to the algorithms presented here if needed.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 119]



Base

Present Forms

Past

Past-Participle

Present-Participle

go

go, goes

went

gone

going

sing

sing, sings

sang

sung

singing

bid

bid, bids

bid

bidden

bidding



3. (medium) Extend the lexicon in Figure 4.6 and Grammar 4.7 so that the following two sentences are accepted:

He was sad to see the dog cry.

He saw the man saw the wood with the saw.

Justify your new rules by showing that they correctly handle a range of similar cases. Either implement and test your extended grammar using the supplied parser, or draw out the full chart that would be constructed for each sentence by a top-down chart parser.



4. (medium)

a. Write a grammar with features that will successfully allow the following phrases as noun phrases:

three o’clock

quarter after eight

ten minutes to six

seven thirty-five

half past four

but will not permit the following:

half to eight three

twenty o’clock

ten forty-five after six

Specify the feature manipulations necessary so that once the parse is completed, two features, HOUR and MINUTES, are set in the NP constituent. If this requires an extension to the feature mechanism, carefully describe the extension you assume.

b. Choose two other forms of grammatical phrases accepted by the grammar. Find an acceptable phrase not accepted by your grammar. If any nongrammatical phrases are allowed, give one example.



5. (medium) English pronouns distinguish case. Thus I can be used as a sub -ject, and me can be used as an object. Similarly, there is a difference between he and him, we and us, and they and them. The distinction is not made for the pronoun you. Specify an augmented context-free grammar and lexicon for simple subject-verb-object sentences that allows only appropriate pronouns in the subject and object positions and does number agreement between the subject and verb. Thus it should accept (hit him, but not me love you. Your grammar should account for all the pronouns mentioned in this question, but it need have only one verb entry and need cover no other noun phrases but pronouns.



[Allen 1995 : Chapter 4 - Features and Augmented Grammars 120]






a. Specify some words by category and give four structurally different sentences accepted by this network.

b. Specify an augmentation for this network in the notation defined in this chapter so that sentences with the main verb give are allowed only if the subject is animate, and sentences with the main verb be may take either an animate or inanimate subject. Show a lexicon containing a few words that can be used to demonstrate the network’s selectivity.



7. (medium) Using the following unification grammar, draw the DAGs for the two NP structures as they are when they are first constructed by the parser, and then give the DAG for the complete sentence (which will include all subconstituents of S as well) The fish is a large one. You may assume the lexicon in Figure 4.6, but define lexical entries for any words not covered there.








>> back