1 of 18

Part-of-speech tagging

A simple but useful form of linguistic analysis

Christopher Manning

Christopher Manning

2 of 18

Parts of Speech

  • Perhaps starting with Aristotle in the West (384–322 BCE), there was the idea of having parts of speech
    • a.k.a lexical categories, word classes, “tags”, POS
  • It comes from Dionysius Thrax of Alexandria (c. 100 BCE) the idea that is still with us that there are 8 parts of speech
    • But actually his 8 aren’t exactly the ones we are taught today
      • Thrax: noun, verb, article, adverb, preposition, conjunction, participle, pronoun
      • School grammar: noun, verb, adjective, adverb, preposition, conjunction, pronoun, interjection

Christopher Manning

3 of 18

Open class (lexical) words

Closed class (functional)

Nouns

Verbs

Proper

Common

Modals

Main

Adjectives

Adverbs

Prepositions

Particles

Determiners

Conjunctions

Pronouns

… more

… more

IBM

Italy

cat / cats

snow

see

registered

can

had

old older oldest

slowly

to with

off up

the some

and or

he its

Numbers

122,312

one

Interjections

Ow Eh

Christopher Manning

4 of 18

Open vs. Closed classes

  • Open vs. Closed classes
    • Closed:
      • determiners: a, an, the
      • pronouns: she, he, I
      • prepositions: on, under, over, near, by, …
      • Why closed?
    • Open:
      • Nouns, Verbs, Adjectives, Adverbs.

Christopher Manning

5 of 18

POS Tagging

  • Words often have more than one POS: back
    • The back door = JJ
    • On my back = NN
    • Win the voters back = RB
    • Promised to back the bill = VB
  • The POS tagging problem is to determine the POS tag for a particular instance of a word.

Christopher Manning

6 of 18

POS Tagging

  • Input: Plays well with others
  • Ambiguity: NNS/VBZ UH/JJ/NN/RB IN NNS
  • Output: Plays/VBZ well/RB with/IN others/NNS
  • Uses:
    • Text-to-speech (how do we pronounce “lead”?)
    • Can write regexps like (Det) Adj* N+ over the output for phrases, etc.
    • As input to or to speed up a full parser
    • If you know the tag, you can back off to it in other tasks

Penn Treebank POS tags

Christopher Manning

7 of 18

POS tagging performance

  • How many tags are correct? (Tag accuracy)
    • About 97% currently
    • But baseline is already 90%
      • Baseline is performance of stupidest possible method
        • Tag every word with its most frequent tag
        • Tag unknown words as nouns
    • Partly easy because
      • Many words are unambiguous
      • You get points for them (the, a, etc.) and for punctuation marks!

Christopher Manning

8 of 18

Deciding on the correct part of speech can be difficult even for people

  • Mrs/NNP Shaefer/NNP never/RB got/VBD around/RP to/TO joining/VBG

  • All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT corner/NN

  • Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD

Christopher Manning

9 of 18

How difficult is POS tagging?

  • About 11% of the word types in the Brown corpus are ambiguous with regard to part of speech
  • But they tend to be very common words. E.g., that
    • I know that he is honest = IN
    • Yes, that play was nice = DT
    • You can’t go that far = RB
  • 40% of the word tokens are ambiguous

Christopher Manning

10 of 18

Part-of-speech tagging

A simple but useful form of linguistic analysis

Christopher Manning

Christopher Manning

11 of 18

Part-of-speech tagging revisited

A simple but useful form of linguistic analysis

Christopher Manning

Christopher Manning

12 of 18

Sources of information

  • What are the main sources of information for POS tagging?
    • Knowledge of neighboring words
      • Bill saw that man yesterday
      • NNP NN DT NN NN
      • VB VB(D) IN VB NN
    • Knowledge of word probabilities
      • man is rarely used as a verb….
  • The latter proves the most useful, but the former also helps

Christopher Manning

13 of 18

More and Better Features 🡺 Feature-based tagger

  • Can do surprisingly well just looking at a word by itself:
    • Word the: the → DT
    • Lowercased word Importantly: importantly → RB
    • Prefixes unfathomable: un- → JJ
    • Suffixes Importantly: -ly → RB
    • Capitalization Meridian: CAP → NNP
    • Word shapes 35-year: d-x → JJ
  • Then build a maxent (or whatever) model to predict tag
    • Maxent P(t|w): 93.7% overall / 82.6% unknown

Christopher Manning

14 of 18

Overview: POS Tagging Accuracies

  • Rough accuracies:
    • Most freq tag: ~90% / ~50%

    • Trigram HMM: ~95% / ~55%
    • Maxent P(t|w): 93.7% / 82.6%
    • TnT (HMM++): 96.2% / 86.0%
    • MEMM tagger: 96.9% / 86.9%
    • Bidirectional dependencies: 97.2% / 90.0%
    • Upper bound: ~98% (human agreement)

Most errors on unknown words

Christopher Manning

15 of 18

How to improve supervised results?

  • Build better features!

    • We could fix this with a feature that looked at the next word

    • We could fix this by linking capitalized words to their lowercase versions

PRP VBD IN RB IN PRP VBD .

They left as soon as he arrived .

NNP NNS VBD VBN .

Intrinsic flaws remained undetected .

RB

JJ

Christopher Manning

16 of 18

Tagging Without Sequence Information

t0

w0

Baseline

t0

w0

w-1

w1

Three Words

Model

Features

Token

Unknown

Sentence

Baseline

56,805

93.69%

82.61%

26.74%

3Words

239,767

96.57%

86.78%

48.27%

Using words only in a straight classifier works as well as a basic (HMM or discriminative) sequence model!!

Christopher Manning

17 of 18

Summary of POS Tagging

For tagging, the change from generative to discriminative model does not by itself result in great improvement

One profits from models for specifying dependence on overlapping features of the observation such as spelling, suffix analysis, etc.

An MEMM allows integration of rich features of the observations, but can suffer strongly from assuming independence from following observations; this effect can be relieved by adding dependence on following words

This additional power (of the MEMM ,CRF, Perceptron models) has been shown to result in improvements in accuracy

The higher accuracy of discriminative models comes at the price of much slower training

Christopher Manning

18 of 18

Part-of-speech tagging revisited

A simple but useful form of linguistic analysis

Christopher Manning

Christopher Manning