1 of 26

Dependency Parsing

Introduction

Christopher Manning

2 of 26

Dependency Grammar and �Dependency Structure

Dependency syntax postulates that syntactic structure consists of lexical items linked by binary asymmetric relations (“arrows”) called dependencies

The arrows are commonly typed with the name of grammatical relations (subject, prepositional object, apposition, etc.)

submitted

Bills

were

Brownback

Senator

nsubjpass

auxpass

prep

nn

immigration

conj

by

cc

and

ports

pobj

prep

on

pobj

Republican

Kansas

pobj

prep

of

appos

Christopher Manning

3 of 26

Relation between phrase structure and dependency structure

  • A dependency grammar has a notion of a head. Officially, CFGs don’t.
  • But modern linguistic theory and all modern statistical parsers (Charniak, Collins, Stanford, …) do, via hand-written phrasal “head rules”:
    • The head of a Noun Phrase is a noun/number/adj/…
    • The head of a Verb Phrase is a verb/modal/….
  • The head rules can be used to extract a dependency parse from a CFG parse
  • The closure of dependencies give constituency from a dependency tree
  • But the dependents of a word must be at the same level (i.e., “flat”) – there can be no VP!

Christopher Manning

4 of 26

Methods of Dependency Parsing

  1. Dynamic programming (like in the CKY algorithm)

You can do it similarly to lexicalized PCFG parsing: an O(n5) algorithm

Eisner (1996) gives a clever algorithm that reduces the complexity to O(n3), by producing parse items with heads at the ends rather than in the middle

  • Graph algorithms

You create a Maximum Spanning Tree for a sentence

McDonald et al.’s (2005) MSTParser scores dependencies independently using a ML classifier (he uses MIRA, for online learning, but it could be MaxEnt)

  • Constraint Satisfaction

Edges are eliminated that don’t satisfy hard constraints. Karlsson (1990), etc.

  • “Deterministic parsing”

Greedy choice of attachments guided by machine learning classifiers

MaltParser (Nivre et al. 2008) – discussed in the next segment

Christopher Manning

5 of 26

Dependency Conditioning Preferences

What are the sources of information for dependency parsing?

  1. Bilexical affinities [issues 🡪 the] is plausible
  2. Dependency distance mostly with nearby words
  3. Intervening material

Dependencies rarely span intervening verbs or punctuation

  • Valency of heads

How many dependents on which side are usual for a head?

ROOT Discussion of the outstanding issues was completed .

Christopher Manning

6 of 26

Quiz question!

  • Consider this sentence:

Retail sales drop in April cools afternoon market trading.

  • Which word are these words a dependent of?
    1. sales
    2. April
    3. afternoon
    4. trading

Christopher Manning

7 of 26

Dependency Parsing

Introduction

Christopher Manning

8 of 26

Greedy Transition-Based Parsing

MaltParser

Christopher Manning

9 of 26

MaltParser�[Nivre et al. 2008]

  • A simple form of greedy discriminative dependency parser
  • The parser does a sequence of bottom up actions
    • Roughly like “shift” or “reduce” in a shift-reduce parser, but the “reduce” actions are specialized to create dependencies with head on left or right
  • The parser has:
    • a stack σ, written with top to the right
      • which starts with the ROOT symbol
    • a buffer β, written with top to the left
      • which starts with the input sentence
    • a set of dependency arcs A
      • which starts off empty
    • a set of actions

Christopher Manning

10 of 26

Basic transition-based dependency parser

Start: σ = [ROOT], β = w1, …, wn , A = ∅

  1. Shift σ, wi|β, A 🡺 σ|wi, β, A
  2. Left-Arcr σ|wi, wj|β, A 🡺 σ, wj|β, A∪{r(wj,wi)}
  3. Right-Arcr σ|wi, wj|β, A 🡺 σ, wi|β, A∪{r(wi,wj)}

Finish: β = ∅

Notes:

  • Unlike the regular presentation of the CFG reduce step, dependencies combine one thing from each of stack and buffer

Christopher Manning

11 of 26

Actions (“arc-eager” dependency parser)

Start: σ = [ROOT], β = w1, …, wn , A = ∅

  1. Left-Arcr σ|wi, wj|β, A 🡺 σ, wj|β, A∪{r(wj,wi)}

Precondition: r’ (wk, wi) ∉ A, wi ≠ ROOT

  • Right-Arcr σ|wi, wj|β, A 🡺 σ|wi|wj, β, A∪{r(wi,wj)}
  • Reduce σ|wi, β, A 🡺 σ, β, A

Precondition: r’ (wk, wi) ∈ A

  • Shift σ, wi|β, A 🡺 σ|wi, β, A

Finish: β = ∅

This is the common “arc-eager” variant: a head can immediately take a right dependent, before its dependents are found

Christopher Manning

12 of 26

Example

Happy children like to play with their friends .

[ROOT] [Happy, children, …] ∅

Shift [ROOT, Happy] [children, like, …] ∅

LAamod [ROOT] [children, like, …] {amod(children, happy)} = A1

Shift [ROOT, children] [like, to, …] A1

LAnsubj [ROOT] [like, to, …] A1 ∪ {nsubj(like, children)} = A2

RAroot [ROOT, like] [to, play, …] A2 ∪{root(ROOT, like) = A3

Shift [ROOT, like, to] [play, with, …] A3

LAaux [ROOT, like] [play, with, …] A3∪{aux(play, to) = A4

RAxcomp [ROOT, like, play] [with their, …] A4∪{xcomp(like, play) = A5

  1. Left-Arcr σ|wi, wj|β, A 🡺 σ, wj|β, A∪{r(wj,wi)}

Precondition: (wk, r’, wi) ∉ A, wi ≠ ROOT

  • Right-Arcr σ|wi, wj|β, A 🡺 σ|wi|wj, β, A∪{r(wi,wj)}
  • Reduce σ|wi, β, A 🡺 σ, β, A

Precondition: (wk, r’, wi) ∈ A

  • Shift σ, wi|β, A 🡺 σ|wi, β, A

Christopher Manning

13 of 26

Example

Happy children like to play with their friends .

RAxcomp [ROOT, like, play] [with their, …] A4∪{xcomp(like, play) = A5

RAprep [ROOT, like, play, with] [their, friends, …] A5∪{prep(play, with) = A6

Shift [ROOT, like, play, with, their] [friends, .] A6

LAposs [ROOT, like, play, with] [friends, .] A6∪{poss(friends, their) = A7

RApobj [ROOT, like, play, with, friends] [.] A7∪{pobj(with, friends) = A8

Reduce [ROOT, like, play, with] [.] A8

Reduce [ROOT, like, play] [.] A8

Reduce [ROOT, like] [.] A8

RApunc [ROOT, like, .] [] A8∪{punc(like, .) = A9

You terminate as soon as the buffer is empty. Dependencies = A9

  1. Left-Arcr σ|wi, wj|β, A 🡺 σ, wj|β, A∪{r(wj,wi)}

Precondition: (wk, r’, wi) ∉ A, wi ≠ ROOT

  • Right-Arcr σ|wi, wj|β, A 🡺 σ|wi|wj, β, A∪{r(wi,wj)}
  • Reduce σ|wi, β, A 🡺 σ, β, A

Precondition: (wk, r’, wi) ∈ A

  • Shift σ, wi|β, A 🡺 σ|wi, β, A

Christopher Manning

14 of 26

MaltParser�[Nivre et al. 2008]

  • We have left to explain how we choose the next action
  • Each action is predicted by a discriminative classifier (often SVM, could be maxent classifier) over each legal move
    • Max of 4 untyped choices, max of |R| × 2 + 2 when typed
    • Features: top of stack word, POS; first in buffer word, POS; etc.
  • There is NO search (in the simplest and usual form)
    • But you could do some kind of beam search if you wish
  • The model’s accuracy is slightly below the best LPCFGs (evaluated on dependencies), but
  • It provides close to state of the art parsing performance
  • It provides VERY fast linear time parsing

Christopher Manning

15 of 26

Evaluation of Dependency Parsing: �(labeled) dependency accuracy

ROOT She saw the video lecture

0 1 2 3 4 5

Gold

  1. 2 She nsubj
  2. 0 saw root
  3. 5 the det
  4. 5 video nn
  5. 2 lecture dobj

Parsed

  1. 2 She nsubj
  2. 0 saw root
  3. 4 the det
  4. 5 video nsubj
  5. 2 lecture ccomp

Acc = # correct deps

# of deps

UAS = 4 / 5 = 80%

LAS = 2 / 5 = 40%

Christopher Manning

16 of 26

Representative performance numbers

  • The CoNLL-X (2006) shared task provides evaluation numbers for various dependency parsing approaches over 13 languages
    • MALT: LAS scores from 65–92%, depending greatly on language/treebank
  • Here we give a few UAS numbers for English to allow some comparison to constituency parsing

Parser

UAS%

Sagae and Lavie (2006) ensemble of dependency parsers

92.7

Charniak (2000) generative, constituency

92.2

Collins (1999) generative, constituency

91.7

McDonald and Pereira (2005) – MST graph-based dependency

91.5

Yamada and Matsumoto (2003) – transition-based dependency

90.4

Christopher Manning

17 of 26

Projectivity

  • Dependencies from a CFG tree using heads, must be projective
    • There must not be any crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words.
  • But dependency theory normally does allow non-projective structures to account for displaced constituents
    • You can’t easily get the semantics of certain constructions right without these nonprojective dependencies

Who did Bill buy the coffee from yesterday ?

Christopher Manning

18 of 26

Handling non-projectivity

  • The arc-eager algorithm we presented only builds projective dependency trees
  • Possible directions to head:
    1. Just declare defeat on nonprojective arcs
    2. Use a dependency formalism which only admits projective representations (a CFG doesn’t represent such structures…)
    3. Use a postprocessor to a projective dependency parsing algorithm to identify and resolve nonprojective links
    4. Add extra types of transitions that can model at least most non-projective structures
    5. Move to a parsing mechanism that does not use or require any constraints on projectivity (e.g., the graph-based MSTParser)

Christopher Manning

19 of 26

Greedy Transition-Based Parsing

MaltParser

Christopher Manning

20 of 26

Dependencies encode relational structure

Relation Extraction with Stanford Dependencies

Christopher Manning

21 of 26

Dependency paths identify �relations like protein interaction

[Erkan et al. EMNLP 07, Fundel et al. 2007]

KaiC 🡸nsubj interacts prep_with🡺 SasA

KaiC 🡸nsubj interacts prep_with🡺 SasA conj_and🡺 KaiA

KaiC 🡸nsubj interacts prep_with🡺 SasA conj_and🡺 KaiB

demonstrated

results

KaiC

interacts

rythmically

nsubj

The

compl

det

ccomp

that

nsubj

KaiB

KaiA

SasA

conj_and

conj_and

advmod

prep_with

Christopher Manning

22 of 26

Stanford Dependencies

[de Marneffe et al. LREC 2006]

  • The basic dependency representation is projective
  • It can be generated by postprocessing headed phrase structure parses (Penn Treebank syntax)
  • It can also be generated directly by dependency parsers, such as MaltParser, or the Easy-First Parser

jumped

boy

over

the

the

little

prep

nsubj

det

amod

pobj

fence

det

Christopher Manning

23 of 26

Graph modification to facilitate semantic analysis

Bell, based in LA, makes and distributes

electronic and computer products.

makes

and

nsubj

dobj

products

computer

conj

cc

and

electronic

amod

Bell

in

prep

partmod

based

pobj

LA

cc

conj

distributes

Christopher Manning

24 of 26

Graph modification to facilitate semantic analysis

Bell, based in LA, makes and distributes

electronic and computer products.

makes

nsubj

dobj

products

computer

conj_and

electronic

amod

Bell

prep_in

partmod

based

LA

conj_and

distributes

amod

nsubj

Christopher Manning

25 of 26

BioNLP 2009/2011 relation extraction shared tasks [Björne et al. 2009]

Christopher Manning

26 of 26

Dependencies encode relational structure

Relation Extraction with Stanford Dependencies

Christopher Manning