1 of 239

Computational Analysis of Political Texts

Bridging Research Efforts Across Communities

Goran Glavaš

Federico Nanni

Simone Paolo Ponzetto

Data and Web Science Group

University of Mannheim

2 of 239

Hi there!

Goran

Federico

Simone

https://poltexttutorial.wordpress.com/

3 of 239

Political Text

4 of 239

Political Text

  • Language as a medium for politics and political conflicts

From: https://twitter.com/realdonaldtrump/status/821772494864580614

5 of 239

Political Text

  • Language as a medium for politics and political conflicts
  • Much of politics is expressed in words

From: https://www.theguardian.com/politics/2018/jul/09/boris-johnson-his-path-to-resigning-as-foreign-secretary

6 of 239

Political Text

  • Language as a medium for politics and political conflicts
  • Much of politics is expressed in words
  • However, it is still hard to use texts for making inferences about politics.

From: https://www.flickr.com/photos/statephotos/48079686137/

7 of 239

Political Text

  • Too many political texts

From: https://www.bournemouthecho.co.uk/news/national/17504503.students-strike-over-politicians-inaction-on-climate-change/

8 of 239

Political Text

  • Too many political texts
  • Hiring and training annotators is very expensive
    • Complex phenomena
    • Domain-specific language
    • Subjectivity involved

From: https://www.bournemouthecho.co.uk/news/national/17504503.students-strike-over-politicians-inaction-on-climate-change/

9 of 239

Political Text

  • Too many political texts
  • Hiring and training annotators is very expensive
  • Automated analysis seems to be the only way to go

From: https://www.bournemouthecho.co.uk/news/national/17504503.students-strike-over-politicians-inaction-on-climate-change/

10 of 239

Political Text

Modelling complex phenomena:

  • Polarization
  • Tribalism
  • Party Loyalty / Partisanship
  • Euroscepticism
  • Populism
  • Hybrid Warfare

11 of 239

A Tale of Two Communities

Political Science

2003 Wordscores [Laver et al.]

2008 Wordfish [Slapin & Proksch]

2013 Text as Data [Grimmer & Stewart]

2016 The Manifesto Corpus [Merz et al.]

Programming language: R

Libraries: Quanteda, STM, austin, tm, koRpus, kerasR, coreNLP

Natural Language Processing

2005 EuroParl Corpus [Koehn]

2006 Get out the Vote [Thomas et al.]

2010 From Tweets to Polls [O'Connor et al.]

2014 Text scaling in ACL Anthology [Zirn]

Programming language: Python, Java

Libraries: CoreNLP, Spacy, NLTK, TensorFlow, Keras, Sci-Kit Learn

12 of 239

A Tale of Two Communities

Current issues in interdisciplinary collaborations [Wallach, 2016]

  1. Lack of understanding of each other’s norms, incentive structures, and goals
  2. The need to publish in high-quality venues in a timely fashion
  3. Publishing interdisciplinary research can be slower than single discipline research
  4. These challenges are not always recognized by tenure and promotion committees

13 of 239

A Tale of Two Communities

Current issues in interdisciplinary collaborations [Wallach, 2016]

  • Lack of understanding of each other’s norms, incentive structures, and goals
  • The need to publish in high-quality venues in a timely fashion
  • Publishing interdisciplinary research can be slower than single discipline research
  • These challenges are not always recognized by tenure and promotion committees

Overall goal: training new generations of computational social scientists

14 of 239

Towards a Collaborative Future

From: https://textasdata.github.io//

15 of 239

Towards a Collaborative Future

  • New Directions in Analyzing Text as Data
    • (Mostly) US-based interdisciplinary conference
    • Since 2012, next October 2019 in Stanford
    • https://www.textasdata2019.net/

From: https://textasdata.github.io//

16 of 239

Towards a Collaborative Future

  • New Directions in Analyzing Text as Data
  • NLP+CSS
    • Workshop co-located with ACL events
    • Since 2015, last edition @NAACL 2019
    • https://sites.google.com/site/nlpandcss/

From: https://textasdata.github.io//

17 of 239

Towards a Collaborative Future

  • New Directions in Analyzing Text as Data
  • NLP+CSS
  • PolText
    • Interdisciplinary symposium
    • First edition 2016, next edition Tokyo (Sept ‘19)
    • https://www.poltextconference.org/

From: https://textasdata.github.io//

18 of 239

Towards a Collaborative Future

  • New Directions in Analyzing Text as Data
  • NLP+CSS
  • PolText
  • ParlaCLARIN
    • Workshop on Curating Parliamentary Proc.
    • @LREC 2018
    • https://www.clarin.eu/blog/clarin-parlaformat-workshop

From: https://textasdata.github.io//

19 of 239

Towards a Collaborative Future

The goals of this tutorial:

  1. Systematize and analyze the body of research work from both communities
  2. Provide a gentle, all-round introduction to research questions, methods and tasks
  3. Find a common language across research fields
  4. Expand the interdisciplinary community outside its natural environment
  5. Prepare PIs and PhD Students to the exciting challenges ahead

20 of 239

Agenda

Texts

RQs

Tasks

Topic Detection

Positioning

Scaling

Multilinguality / CL Transfer

21 of 239

Agenda

Texts

RQs

Tasks

Topic Detection

Positioning

Scaling

Multilinguality / CL Transfer

22 of 239

Text as Data

23 of 239

Text as Data - Parliament Debates

General view of the European Parliament during a plenary session in Strasbourg, eastern France, Wednesday, March 13, 2019. (AP Photo)

24 of 239

Text as Data - EuroParl Corpus

EuroParl Corpus (first release 2001, most recent: 2012)

21 European languages (1996 -> 2011 or 2007->2011 depending on the country)

Corpus of tokenized sentences aligned across languages.

Used for:

  • Machine translation
  • Word sense disambiguation
  • Cross-lingual learning

Available at: https://www.statmt.org/europarl/

25 of 239

Text as Data - EuroParl Corpus

Learning phrase representations using RNN encoder-decoder for statistical machine translation [Cho et al., 2014]

KenLM: Faster and Smaller Language Model Queries [Heafield, 2011]

Normalized (pointwise) mutual information in collocation extraction [Bouma, 2009]

PPDB: The Paraphrase Database [Ganitkevitch, 2013]

Learning bilingual lexicons from monolingual corpora [Haghighi et al. 2008 ]

26 of 239

Text as Data - Linked EP

Plenary debates of the EP as Linked Open Data [Van Aggelen et al. 2016]

All plenary debates between 1999 and 2017 with links to GeoNames and DBpedia.

Available at : https://linkedpolitics.project.cwi.nl/web/html/home.html

Access to the data:

  1. Through HTTP-resolvable URIs
  2. Full-text search is provided
  3. Through a SPARQL endpoint
  4. Using the browse and search options of ClioPatria.
  5. By downloading the data in turtle (2.5Gb, gzipped tar file).

27 of 239

Text as Data - Linked EP

Plenary debates of the EP as Linked Open Data [Van Aggelen et al. 2016]

All plenary debates between 1999 and 2017 with links to GeoNames and DBpedia.

Available at : https://linkedpolitics.project.cwi.nl/web/html/home.html

Issues:

  • Hard to access for a non-expert
  • Pre-clean / filtering of speeches not super transparent

28 of 239

Text as Data - Scraping the EP website

http://www.europarl.europa.eu

Pros:

  • Control on the selection process
  • Control on the metadata

Cons:

  • Not straightforward
  • You need to know “How to Crawl the Web Politely

29 of 239

Text as Data - Irish Dáil

Wordscores [Laver et al., 2003] initially tested on a confidence debate in October 1991.

Worshoal [Lauderdale & Herzog, 2016] tested as an example of a multiparty system.

Database of Parliamentary Speeches in Ireland, 1919–2013 [Herzog & Mikhaylov, 2017]

30 of 239

Text as Data - Irish Dáil

Wordscores [Laver et al., 2003] initially tested on a confidence debate in October 1991.

Worshoal [Lauderdale & Herzog, 2016] tested as an example of a multiparty system.

Database of Parliamentary Speeches in Ireland, 1919–2013 [Herzog & Mikhaylov, 2017]

Popular because:

  • European party system
  • “Real” multi-party system
  • Content in English
  • 1991 debate directly available in Quanteda

31 of 239

Text as Data - UK Hansard

Hansard Corpus (1803 - 2005)

Hansard Online (1803 - 2019)

DiLiPad (1803 - 2014)

TheyWorkForYou (1918 - 2019)

From: https://www.youtube.com/watch?v=H4v7wddN-Wg

32 of 239

Text as Data - UK Hansard

Hansard Corpus (1803 - 2005)

Hansard Online (1803 - 2019)

DiLiPad (1803 - 2014)

TheyWorkForYou (1918 - 2019)

Popular because:

  • Over two centuries of data
  • Curated by different interdisciplinary projects (NLP, CL, CSS, DH)
  • Content in English

From: https://www.youtube.com/watch?v=H4v7wddN-Wg

33 of 239

Text as Data - UK Hansard

Hansard Corpus (1803 - 2005)

Hansard Online (1803 - 2019)

DiLiPad (1803 - 2014)

TheyWorkForYou (1918 - 2019)

Popular because:

  • Over two centuries of data
  • Curated by different interdisciplinary projects (NLP, CL, CSS, DH)
  • Content in English

From: https://www.youtube.com/watch?v=H4v7wddN-Wg

34 of 239

Text as Data - US Congress

ConVote Dataset (all House debates, 2005) [Thomas et al., 2006]

Popular for:

  • Stance detection
  • Vote prediction
  • Opinion mining

35 of 239

Text as Data - US Congress

ConVote Dataset (all House debates, 2005) [Thomas et al., 2006]

Popular for:

  • Stance detection
  • Vote prediction
  • Opinion mining

Congressional Record for the 43rd-114th Congresses (1873 - 2017) [Gentzkow et al., 2018], derived from scans from HeinOnline.

36 of 239

Text as Data - United Nations

The UN General Debate corpus [Baturo et al., 2016]

Over 7300 country statements from 1970–2014

All in English (official translations by the UN)

37 of 239

Text as Data - Other Parliaments

CLARIN Parliamentary Corpora

  • ParlAT (Austria, 1996 - 2017)
  • Danish Parliamentary Corpus (2009-2017)
  • Italian Camera as RDF

38 of 239

Text as Data - Other Parliaments

CLARIN Parliamentary Corpora

  • ParlAT (Austria, 1996 - 2017)
  • Danish Parliamentary Corpus (2009-2017)
  • Italian Camera as RDF

Issues:

  • Many datasets (often more than one per country)
  • Resources are not aligned in time with each other
  • Often not maintained anymore

39 of 239

Text as Data - Other Parliaments

CLARIN Parliamentary Corpora

  • ParlAT (Austria, 1996 - 2017)
  • Danish Parliamentary Corpus (2009-2017)
  • Italian Camera as RDF

ParlSpeech [Rauh et al., 2017] (3.9 million plenary speeches Czech Republic, Finland, Germany, the Netherlands, Spain, Sweden, and the United Kingdom)

Sentiment and position-taking analysis of parliamentary debates: A systematic literature review [Abercrombie & Batista-Navarro, 2019]

40 of 239

Text as Data - Manifestos

41 of 239

Text as Data - Manifestos

Since 1979, the Manifesto Project collects and codes electoral programs for all relevant political parties at democratic elections from 1945 or the first democratic election in over 50 countries.

Country experts (usually political scientists who are native language speakers) are hired to code the electoral programs. Coders first split the electoral programs into so-called “quasi-sentences”, each of which “contains exactly one statement or message”.

The coders allocate to every quasi-sentence a code, corresponding to one of 56 categories, which captures the most relevant policy issues and goals.

In order to do this, coders are taken through a training process.

42 of 239

Text as Data - Manifestos

In the past the coding of these documents was performed using printed copies of the electoral programs, annotating in the margins of the pages.

The first serious effort towards digitization was made by Paul Pennings and Hans Keman of the Comparative Electronic Manifestos Project (2006), who digitized 1144 electoral programs included in the Manifesto Corpus (2015 v.1).

The corpus [Merz et al., 2016] currently covers electoral programmes from more than 50 different countries in more than 35 languages. It contains more than 2300 machine-readable programmes. For more than 1.150 of these, unitising and codings are available as well. These are more than 1,000,000 coded quasi-sentences.

https://manifesto-project.wzb.eu/

43 of 239

Text as Data - Manifestos

A gold standard or a “no-alternative” scenario? [Budge & Pennings, 2007; Benoit et al., 2012; Mikhaylov et al., 2012; Gemenis, 2013]

44 of 239

Text as Data - Manifestos

A gold standard or a “no-alternative” scenario? [Budge & Pennings, 2007; Benoit et al., 2012; Mikhaylov et al., 2012; Gemenis, 2013]

1) Theoretical framework of the coding scheme

Salience theory of party competition: policy differences between parties are assumed to consist of contrasting emphasis placed in different policy areas

  • Relevant for US and UK
  • Not true in many multi-party competitions -> niche parties
  • Core is not emphasis but position
  • Coding scheme actually captures also positions (pro/con)

45 of 239

Text as Data - Manifestos

A gold standard or a “no-alternative” scenario? [Budge & Pennings, 2007; Benoit et al., 2012; Mikhaylov et al., 2012; Gemenis, 2013]

1) Theoretical framework of the coding scheme

2) Document selection

  • Not only manifestos (drafts, speeches, reports, news, flyers, interviews)
  • Many of them are not equivalent to manifestos
  • Length = authority does not hold anymore!

46 of 239

Text as Data - Manifestos

A gold standard or a “no-alternative” scenario? [Budge & Pennings, 2007; Benoit et al., 2012; Mikhaylov et al., 2012; Gemenis, 2013]

1) Theoretical framework of the coding scheme

2) Document selection

3) Coding reliability

  • Central standard for coders (internal training and test)
  • One annotator per manifesto
  • What about crowdsourcing? [Benoit et al, 2016]
  • Issue not with coders but with coding scheme itself

47 of 239

Text as Data - Manifestos

A gold standard or a “no-alternative” scenario? [Budge & Pennings, 2007; Benoit et al., 2012; Mikhaylov et al., 2012; Gemenis, 2013]

1) Theoretical framework of the coding scheme

2) Document selection

3) Coding reliability

4) Scaling

  • Left-Right actually differs across countries
  • Emphasis as a proxy for determining position not really consistent
  • In general L-R should not be treated as gold information

48 of 239

Text as Data - Agendas

49 of 239

Text as Data - Agendas

The Comparative Agendas Project (CAP) assembles and codes information on the policy processes of governments from around the world. Focus on the policy used, proposed or discussed.

Initially developed in the US in the early 1990s. Aggregator of many different projects, analysing different types of documents (news, laws, etc.) using a single, universal and consistent coding scheme. CAP monitors policy processes by tracking the actions that governments take in response to the challenges they face.

https://www.comparativeagendas.net/

50 of 239

Text as Data - Agendas

The CAP Codebook:

  • 21 Major Topics
  • 220 Subtopics

51 of 239

Text as Data - Campaign Speeches / Debates

52 of 239

Text as Data - Campaign Speeches / Debates

The American Presidency Project has transcripts of:

  • Convention speeches
  • Debates
  • Party Platforms
  • Campaign documents

https://www.presidency.ucsb.edu

53 of 239

Text as Data - Campaign Speeches / Debates

Other resources:

  1. From papers:
  2. 2012 Republican primary debates [Prabhakaran et al., 2013]
  3. Dutch and Danish party congress speeches [Schumacher et al., 2019]
  4. 2015 UK Election debates (audio and transcripts) [Lippi & Torroni, 2016]

2) Transcript in news media (newspapers, fact-checking websites)

  • Full Transcript: Democratic Presidential Debates, Night 1 (NYT)
  • Fact-checking the Democratic debate in Miami, night 1 (PolitiFact)

54 of 239

Text as Data - Press Releases

55 of 239

Text as Data - Press Releases

AUTNES Content Analysis of Party Press Releases (OTS) 2013 [Müller et al., 2017]

56 of 239

Text as Data - Leaders

57 of 239

Text as Data - Leaders

The American Presidency Project has transcripts of:

  1. Presidential orders
  2. Memoranda
  3. Proclamations
  4. Interviews
  5. Letters

https://www.presidency.ucsb.edu

58 of 239

Text as Data - Leaders

The American Presidency Project has transcripts of:

  • Presidential orders
  • Memoranda
  • Proclamations
  • Interviews
  • Letters

EUSpeech: a New Dataset of EU Elite Speeches [Schumacher et al., 2016]

  1. Over 18k speeches from EU leaders
  2. Time-range 2007 to 2015.

59 of 239

Text as Data - Leaders

The Global Populism Database is the most up-to-date, comprehensive and reliable repository of populist discourse in the world. It was commissioned by the Guardian and built by Team Populism, a global network of scholars dedicated to the scientific study of the causes and consequences of populism.

Issues:

  • The data is not directly available
  • The selection is not clear

60 of 239

Text as Data - Legislative Corpora

61 of 239

Text as Data - Legislative Corpora

Sunlight Foundation SUNLIGHT US CONGRESS API

  • Look up members of Congress by location or by zip code
  • Official Twitter, YouTube, and Facebook accounts
  • The daily work of Congress: bills, amendments, nominations
  • The live activity of Congress: past and future votes, floor activity, hearings

EurLex

  • Freely ac­ces­si­ble repos­i­to­ry for Eu­ro­pean Union law texts (multilingual)
  • treaties, in­ter­na­tion­al agree­ments, leg­is­la­tion in force, leg­is­la­tion in prepa­ra­tion, case-law and par­lia­men­tary ques­tions
  • HTMLs and PDFs

62 of 239

Text as Data - Social Media

63 of 239

Text as Data - Social Media

Politician’s opinions:

  • List of all MEPs: https://twitter.com/europarl_en/lists/all-meps-on-twitter
  • All members of US Congress: https://twitter.com/cspan/lists/members-of-congress
  • UK MPs: https://twitter.com/twittergov/lists/uk-mps

Voters’ opinions:

  • Harvard Dataverse (e.g. 2018 U.S. Congressional Election Tweet Ids)
  • Internet Archive Twitter Stream
  • Reddit Corpus

64 of 239

Agenda

Text

RQs

Tasks

Topic Detection

Positioning

Scaling

Multilinguality / CL Transfer

65 of 239

Text as Data

66 of 239

Text as Data - Examples

Exploring the political agenda of the European Parliament using a dynamic topic modeling approach [Greene & Cross, 2017]

  • How the political agenda of the EP has evolved over time and reacted to stimuli in the period 1999-2014
  • They show how a dynamic topic modeling approach, based on Non-negative Matrix Factorization is better suited than LDA
  • Able to capture the attention of EP to external events (e.g., the Euro Crisis)

67 of 239

Text as Data - Examples

Exploring the political agenda of the European Parliament using a dynamic topic modeling approach [Greene & Cross, 2017]

A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases [Grimmer, 2010]

  • Measure how US senators explain their work in Washington to constituents using a collection of over 24,000 press releases from senators from 2007.
  • The Expressed Agenda Model measures priorities of each author
  • Ideal for comparing priorities

68 of 239

Text as Data - Examples

Exploring the political agenda of the European Parliament using a dynamic topic modeling approach [Greene & Cross, 2017]

A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases [Grimmer, 2010]

How to analyze political attention with minimal assumptions and costs [Quinn et al., 2010]

  • Topic model to examine the agenda in the U.S. Senate from 1997 to 2004
  • New database of over 118,000 speeches from the Congressional Record
  • Model reveals speech topic categories that are both distinctive and meaningfully interrelated and a richer view of democratic agenda dynamics

69 of 239

Text as Data - Examples

Measuring group differences in high-dimensional choices: Method and application to Congressional speech [Gentzkow et al., 2016]

  • Measure trends in the partisanship of congressional speech from 1873 to 2016
  • Partisanship as the ease with which an observer could infer a party from a single utterance
  • partisanship increased sharply in the early 1990s

70 of 239

Text as Data - Examples

Measuring group differences in high-dimensional choices: Method and application to Congressional speech [Gentzkow et al., 2016]

Position taking in European Parliament speeches [Proksch & Slapin, 2010]

  • how national parties position themselves in EP debates
  • we find that it reflects partisan divisions over EU integration and national divisions rather than left–right politics
  • Results are robust across languages used to scale the speeches

71 of 239

Text as Data - Examples

Measuring group differences in high-dimensional choices: Method and application to Congressional speech [Gentzkow et al., 2016]

Position taking in European Parliament speeches [Proksch & Slapin, 2010]

Testing the Etch-a-Sketch Hypothesis: Measuring Ideological Signaling via Candidates’ Use of Key Phrases [Gross et al., 2013]

  • Presidential candidates should shift toward the general electorate’s median voter after securing their parties’ nominations.
  • Test the theory using candidates’ campaign speeches as data
  • Develop a model to identify ideological cues in political text

72 of 239

Text as Data

[Grimmer & Stewart, 2013]

73 of 239

Text as Data

[Grimmer & Stewart, 2013]

74 of 239

Text as Data

[Grimmer & Stewart, 2013]

RQ1:

What is it about?

(the topic, the issue)

RQ2:

How does it compare with others?

(the position)

75 of 239

Agenda

Text

RQs

Tasks

Topic Detection

Positioning

Scaling

Multilinguality / CL Transfer

76 of 239

Topic Detection

[Quinn et al., 2010]

77 of 239

Topic Detection

[Quinn et al., 2010]

78 of 239

Different Objectives

Finding the needle in the haystack or characterizing the haystack? [Hopkins & King, 2010]

  • When social scientists use formal content analysis, it is typically to make generalizations using document category proportions
  • They conducted content analyses to learn about the distribution of classifications in a population, not to assert the classification of any particular document (which would be easy to do through a close reading)
  • Individual document classifications, do not usually constitute the ultimate quantities of interest.

79 of 239

Selection of Speakers? [Proksch & Slapin, 2012; Schwarz et al., 2017]

In political systems that foster an individual relationship between MPs and their voters, party leaders are more likely to accept speeches that deviate from the party line.

In contexts where these relations are mediated by the party, and party unity matters, the party leadership is likely to prohibit expression of dissent on the parliamentary floor.

Take Away: the speakers do not always represent the position of the entire party.

80 of 239

Topic Detection

Supervised

Unsupervised

81 of 239

Topic Detection

Supervised

Unsupervised

82 of 239

Supervised Approaches

Dictionaries

  • Intuitive, easy to apply, generate, monitor and extend
  • Often paired with scores for relevance (harder to have)
  • Difficult to apply out of domain (especially for sentiment analysis)
  • Example: Budget rhetoric in presidential campaigns from 1952 to 2000

[Burden & Sanberg, 2003]

  • Research direction: Expanding dictionaries with word embeddings [Tsai & Wang, 2014; Theil et al., 2018; Sternberg, 2018]

83 of 239

Supervised Approaches

Dictionaries

Support Vector Machines in Political Science

  • Used for collection filtering [D’Orazio et al., 2014]
  • Classifying Congressional Bills (226 possible topics) [Hillard et al., 2008; Karan et al., 2016]
  • Hard to apply out-of-domain [Burscher et al., 2015; Nanni et al., 2016]
    1. Political news from a different newspaper or different point in time
    2. Training on manifestos for coding on political campaign speeches

84 of 239

Supervised Approaches

Dictionaries

Support Vector Machines in Political Science

Disadvantages

  • Lack of annotated resources
  • Hard to produce “gold” standard (low coder reliability)
  • Hard to generalize across contexts

85 of 239

Classification of Manifesto Quasi-Sentences

Classifying topics and detecting topic shifts in political manifestos [Zirn et al., 2016]

86 of 239

Classification of Manifesto Quasi-Sentences

Proportional Classification Revisited: Automatic Content Analysis of Political Manifestos Using Active Learning [Wiedemann, 2019]

  • Focus on proportional classification
  • Comparison between a method based on regression analysis with feature profiles from entire collections and a method aggregating classifier decisions for individual documents
  • Improvement on both using active learning

87 of 239

Classification of Manifesto Quasi-Sentences

Hierarchical Structured Model for Fine-to-coarse Manifesto Text Analysis [Subramanian et al., 2018]

  • Captures the dependency between the sentence- and document-level tasks, and also utilize additional label structure
  • Incorporates contextual information (e.g., political coalitions) and encode temporal dependencies for the coarse-level manifesto position using probabilistic soft logic

88 of 239

Topic Detection

[Quinn et al., 2010]

89 of 239

Topic Detection

[Quinn et al., 2010]

90 of 239

Topic Detection

Supervised

Unsupervised

91 of 239

Topic Detection

Supervised

Unsupervised

92 of 239

Unsupervised Approaches

93 of 239

Unsupervised Approaches

Available implementations [Benoit et al., 2018]

  • LDA in Quanteda
  • expAgenda Model
  • Structural topic model

94 of 239

Unsupervised Approaches

Available implementations [Benoit et al., 2018]

Advantages of LDA Topic Models [Quinn et al., 2010; Grimmer & Stewart, 2013]

  • Topics may be difficult to know beforehand
  • Very little investment in pre-analysis stage
  • No human coding of training data
  • Useful for initial exploration

95 of 239

Unsupervised Approaches

Available implementations [Benoit et al., 2018]

Advantages of LDA Topic Models [Quinn et al., 2010; Grimmer & Stewart, 2013]

Issues with Interpretation [Lauscher et al., 2016]

  • Topics are lists of co-occurring words
  • No label describing them
  • Results are very different, depending on the number of topic

96 of 239

Unsupervised Approaches

Available implementations [Benoit et al., 2018]

Advantages of LDA Topic Models [Quinn et al., 2010; Grimmer & Stewart, 2013]

Issues with Interpretation [Lauscher et al., 2016]

Issues with Evaluation [Chang et al., 2009; Wallach et al., 2009; Newman et al., 2010]

  • Post-hoc evaluation is necessary => different results any time LDA runs
  • Intrinsic evaluations (e.g. perplexity) have low correlation with human judg.
  • Word intrusion tasks and topic coherence are very time consuming to assess
  • Simply looking at topic outputs is not an evaluation!

97 of 239

Papers on Unsupervised Topic Detection

Structural Topic Models for Open-Ended Survey Responses [Roberts et al., 2014]

  • Allows the inclusion of covariates of interest into the prior distributions for document-topic proportions and topic-word distributions.
  • This makes analyzing open-ended responses easier, more revealing, and capable of being used to estimate treatment effects.

98 of 239

Papers on Unsupervised Topic Detection

Validating Cross-Perspective Topic Modeling for Extracting Political Parties’ Positions from Parliamentary Proceedings [van der Zwaan et al., 2016]

  • Do the topics learned from the parliamentary proceedings cover all relevant political subjects? (content validity)
  • Can the topics learned from the parliamentary proceedings be used to predict the political subject of texts? (criterion validity)

99 of 239

Papers on Unsupervised Topic Detection

Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress [Nguyen et al., 2015]

  • Making multi-dimensional ideal point models interpretable
  • Combining votes, bill text, legislators speeches and topics from the Policy Agendas Project
  • Structuring topics in a hierarchy, allowing to analyze both agenda issues and issue-specific frames.

100 of 239

Our Experience on Political Topic Analysis

101 of 239

Our Experience on Political Topic Analysis

Unsupervised Text Segmentation of Manifestos [Glavaš et al. *SEM 2016]

Supervised Classification of Manifestos QS [Zirn et al., PolText 2016]

Domain Transfer (manifestos -> campaign speeches) [Nanni et al., PolText 2016]

Cross-lingual Classification of Manifestos QS [Glavaš et al., NLP+CSS 2017]

Key-Concept Clustering of Manifestos QS [Menini et al., EMNLP 2017]

Semantifying the UK Hansard [Nanni et al., ParlaCLARIN 2018 & JCDL 2019]

102 of 239

Interested in Computational Political Science?

We are hiring, get in touch!

{federico,goran,simone}@informatik.uni-mannheim.de

103 of 239

Unsupervised Text Segmentation [Glavas et al. *SEM 2016]

We employ word embeddings and a measure of semantic relatedness of short texts to construct a relatedness graph of the text.

104 of 239

Unsupervised Text Segmentation [Glavas et al. *SEM 2016]

We employ word embeddings and a measure of semantic relatedness of short texts to construct a relatedness graph of the text.

https://bitbucket.org/gg42554/graphseg/

105 of 239

Unsupervised Text Segmentation [Glavas et al. *SEM 2016]

We employ word embeddings and a measure of semantic relatedness of short texts to construct a relatedness graph of the text.

https://bitbucket.org/gg42554/graphseg/

106 of 239

Supervised Classification [Zirn et al., PolText 2016]

We developed a 7 classes coarse-grained classifier using macro-areas from the Manifesto Project.

107 of 239

Supervised Classification [Zirn et al., PolText 2016]

Features:

  1. Bag of words of each sentence
  2. Topic of the previous sentence
  3. Semantic similarity between previous and current sentence
  4. Level relevance of each word in sentence for each class

Linear Support Vector Machine

108 of 239

Supervised Classification [Zirn et al., PolText 2016]

109 of 239

Domain Transfer [Nanni et al., PolText 2016]

We collected all US campaign speeches for the 2008, 2012 and 2016 presidential elections.

Gold Standard: around 1k annotated sentences.

110 of 239

Domain Transfer [Nanni et al., PolText 2016]

We collected all US campaign speeches for the 2008, 2012 and 2016 presidential elections.

Gold Standard: around 1k annotated sentences.

111 of 239

Key-Concept Clustering [Menini et al., EMNLP 2017]

112 of 239

Key-Concept Clustering [Menini et al., EMNLP 2017]

  1. Extract key-concepts with Keyphrase Digger
  2. Represent them as vector using word embeddings
  3. Graph-based clustering approach

https://dh.fbk.eu/technologies/kd

https://github.com/dhfbk/keyphrase_clustering

113 of 239

Key-Concept Clustering [Menini et al., EMNLP 2017]

114 of 239

Key-Concept Clustering [Menini et al., EMNLP 2017]

We extracted 87 topics from six U.S. Manifestos (2004, 2008 2012), which produced 350 pairs of statements, showing:

  1. Disagreement on the responsibilities of previous administrations
  2. Disagreement on foreign policy (Middle East)
  3. Agreement on relation with Europe

115 of 239

Agenda

Text

RQs

Tasks

Topic Detection

Positioning

Scaling

Multilinguality / CL Transfer

116 of 239

Positioning

Political scientists need to infer policy positions from text.

117 of 239

Positioning

Political scientists need to infer policy positions from text.

The position could be determined:

  • Towards a topic, an issue, a target
  • Given an a priori definition of the political space (e.g., left-right)
  • Inductively, in comparison with other texts under study, in a data-driven setting

Many tasks (loosely) correspond to a type of positioning

  • Most prominent: Position detection/classification, political text scaling

118 of 239

Positioning

Political scientists need to infer policy positions from text.

The position could be determined:

  • Towards a topic, an issue, a target
  • Given an a priori definition of the political space (e.g., left-right)
  • Inductively, in comparison with other texts under study, in a data-driven setting

A notable difference NLP and Pol-Sci research: the former considers the specific target of expressed positions, while the latter generally analyses aggregated speeches/corpora.

119 of 239

Topic-based Positioning

120 of 239

Positioning: NLP vs. PolSci

A notable difference in positioning studies/tasks of two communities:

NLP research focused on the target of the expressed position/stance/sentiment

  • Ideology [Sim et al., 2013; Iyyer et al., 2014; Volkova et al., 2014, Kulkarni et al., 2018]
  • Legislation [Thomas et al., 2006; Lauderdale & Herzog, 2016; Eidelman et al., 2017]
  • Topic [van der Zwaan et al., 2016; Menini & Tonelli, 2016; Menini et al., 2017]
  • (Propositional) Statements [Bamman & Smith, 2015]

PolSci research focused on aggregate positional profiling of political actors

  • Positioning actors (people or parties) based on aggregate textual content and ignoring the targets of individual contributions

[Laver et al., 2003; Proksch & Slapin, 2010; Shwarz et al., 2017; Kim et al, 2018]

121 of 239

Positioning: Ideological classification

The line of work that aims to assign one or more ideological labels (classes) to actors

Challenges:

  • Ideology-annotated corpora? (for supervised learning)
  • Assign ideologies to politicians/parties and propagate to all their texts
    • Assumption: politicians/parties are ideologically consistent
  • Is text enough or is there complementary signal?

The famous “Etch-a-Sketch” example:

“Well, I think you hit a reset button for the fall campaign. Everything changes. It’s almost like an Etch-A-Sketch. You can kind of shake it up and restart all of over again.”

-Eric Fehrnstrom, Spokesman for Presidential Candidate Mitt Romney

122 of 239

Ideological shifts: Etch-A-Sketch

123 of 239

Positioning: Ideological classification

The line of work that aims to assign one or more ideological labels (classes) to actors

[Gross et al., 2013; Sim et al., 2013] “Ideological Proportions in Political Speeches”

  • Inferring proportions of known ideological labels from ideology-rich corpus
  • Bayesian approach, HMM-based model

[Iyyer et al., 2014] “Political Ideology Detection Using Recursive Neural Networks”

  • Crowdsource the ideology annotations on a sentential level
  • Train a neural model (recursive network) to detect ideological phrases

[Volkova et al., 2014; Kulkarni et al., 2018]

  • Additionally exploiting non-textual signal for ideological predictions
  • Network structures: neighbours in social media graphs or links between news

124 of 239

Generative Ideology Detection [Sim et al., EMNLP 13]

Data: Ideology Book Corpus [Gross et al., 2013]

  • Collection of 112 books and 10 magazines
  • Ideological labels (tree below) assigned to authors
  • Books chapters additionally labeled with topics
    • E.g., Chapter “Faith” from Obama’s “The Audacity of Hope” gets topical label RELIGION

Image taken from [Gross et al., 2013]

125 of 239

Generative Ideology Detection [Sim et al., EMNLP 13]

2-step approach:

  1. Ideological cue identification using a probabilistic language model
  2. Concretely, Sparse Additive GEnerative models (SAGE) [Eisenstein et al., 2011]
  3. Probability of word’s appearance in the document determined by its effects on documents attributes (parameters η)
  4. Attributes: Coarse-ideology label (RIGHT, LEFT, CENTER), fine-grained ideology label, topic label

  • Parameter estimation: objective with sparsity-inducing L1 regularization, OWL-QN solver

[Andrew & Gao, 2007]

126 of 239

Generative Ideology Detection [Sim et al., EMNLP 13]

2-step approach:

2. Cue-lag ideological proportions (central contribution)

  • Cues (and their effect scores) obtained in the first step employed for ideological profiling of text
  • Corpus: speeches from 2008 and 2012 US presidential elections
  • Speeches from Primary elections
  • Speeches from General elections
  • (Ideological) cue-lag text representations
    • Sequences of non-cue words replaced with sequence length

Example from [Sim et al., 2013]

127 of 239

Generative Ideology Detection [Sim et al., EMNLP 13]

2-step approach:

2. Cue-lag ideological proportions (central contribution)

CLIP Model: A type of a HMM

  • States are ideological classes (e.g., PROGRESSIVE LEFT, RELIGIOUS RIGHT)
  • State emissions: (clue, length-of-the-lag)
  • Transition probabilities
  • Influenced by the distances between Ideological classes in the ideology tree (i.e., label hierarchy)
  • The direct transitions between more distant ideologies are less likely
  • Additional tree- walk ideological parameters

128 of 239

Generative Ideology Detection [Sim et al., EMNLP 13]

2-step approach:

2. Cue-lag ideological proportions (central contribution)

CLIP Model: A type of a HMM

  • States are ideological classes (e.g., PROGRESSIVE LEFT, RELIGIOUS RIGHT)
  • State emissions: (clue, length-of-the-lag)
  • Emission probabilities
  • One multinomial distribution over the entire clue lexicon for each ideological state

Ψs,w: probability of state s (e.g., CENTER) emitting the cue w (e.g., communist)

  • One global Poisson distribution (one global parameter) generates the lags
  • Cue-ideology priors (from phase 1) captured by a Dirichlet distribution
  • Learning: collapsed Gibbs sampling
  • Proportion inference: states generate cues and lags, lengths of lags associated with each ideological state provide the amount of “time” the speaker was in that ideology

129 of 239

Generative Ideology Detection [Sim et al., EMNLP 13]

Evaluation:

1. Gold-standard ideological proportions difficult (impossible?) to obtain

  • Hypothesis based evaluation: strong and moderate hypothesis
    • STRONG: Republican primaries candidates draw more from RIGHT than from LEFT
    • STRONG: Democratic primaries candidates draw more from LEFT than from RIGHT
    • STRONG: In GE, democrats should draw more from the LEFT than from the RIGHT (and vice-versa)

  • Evidence for the “etch-a-sketch” hypothesis?

130 of 239

Discriminative ideology detection [Iyyer et al., ACL 14]

Common NLP story:

  • (Small amount of) high-level (e.g., document-level) annotations: generative models with latent variables
  • Large amount of fine-grained (e.g., sentence- or token-level) annotations: discriminative models

[Iyyer et al., 2014] “Political Ideology Detection Using Recursive Neural Networks”

  • Crowdsource the ideology annotations on a sentence level
  • Assume ideological compositionality over the syntactic structure of the sentence
  • Train a neural model (recursive network) to detect ideological phrases
  • Approach inspired by [Socher et al., 2013]
    • Semantic compositionality over syntactic structure for sentiment analysis

131 of 239

Discriminative ideology detection [Iyyer et al., ACL 14]

Assumption of semantic compositionality of ideological positions/labels:

Image from [Iyyer et al., 2014]

132 of 239

Discriminative ideology detection [Iyyer et al., ACL 14]

Recursive neural network model:

  • Word embeddings (randomly init.

or pre-trained): xa, xb, xd

  • Intermediate node representation:

Non-linearity applied on embedding

projections: global params WL, WR

  • Labels available for the root node (i.e., the whole phrase or sentence)
    • Simple softmax classification (learning: minimization of cross-entropy loss)

Image from [Iyyer et al., 2014]

133 of 239

Discriminative ideology detection [Iyyer et al., ACL 14]

Evaluation:

  • Accuracy on the ConVote and IBC datasets

  • Ideology Book Corpus [Gross et al., 2013]
  • Crowdsourced annotations for

phrases and sentences

  • CONSERVATIVE
  • LIBERAL
  • NEITHER (NEUTRAL)

Image from [Iyyer et al., 2014]

134 of 239

Discriminative ideology detection [Iyyer et al., ACL 14]

Label probabilities over node depths Sentence-level bias detection accuracy

135 of 239

Ideological classification: non-textual signal

While text is very informative, it is often not the only type of signal that can be exploited to (more accurately) predict ideological positions

Common setup: there are links (associations) between the actors/items for which we make ideological predictions

[Volkova et al., 2014; Barbera 2015]

  • Predicting ideological preferences of users in social media
  • Combines user’s content with that of friends/followers (social network)

[Kulkarni et al., 2018]

  • Predicting ideological orientation of news
  • Combines the (multi-source) text of news with the links connecting the news

136 of 239

Ideological classification: non-textual signal

Common setup: there are links (associations) between the actors/items for which we make ideological predictions

[Kulkarni et al., 2018] Predicting ideological orientation of news

137 of 239

Predicting ideological orientation of news [Kulkarni et al., EMNLP 18]

Source: [Kulkarni et al., 2018]

138 of 239

Predicting ideological orientation of news [Kulkarni et al., EMNLP 18]

Top-10 rankings of different ideological news sources

139 of 239

Ideology or (actually) party classification?

Classifiers are often sensible to expressions of attack and defence, opposition and government, not ideology. This is especially true in context of strong party discipline. [Hirst et al., 2014; Søyland & Lapponi, 2017]

140 of 239

Positioning: Legislation

The overarching task is predicting the position of an actor with respect to particular piece of legislation in (parliamentary) voting

  • The so-called roll call data

Traditional approach (not using text as data): ideal point model [Clinton et al., 2004]

  • A generative (Bayesian) model for roll-call analysis
  • Each legislator is represented by an ideal point, a position in the (potentially multidimensional) policy space
  • Each legislative proposal (i.e., Yea or Nay vote on the proposal) is also a point in the same policy space

141 of 239

Positioning: Legislation

The overarching task is predicting the position of an actor with respect to particular piece of legislation in (parliamentary) voting

  • The so-called roll call data

Traditional approach (not using text as data): ideal point model [Clinton et al., 2004]

  • The utility of a concrete vote (Yay or Nay) of a legislator on a legislation: noise-augmented Euclidean-distance between legislator’s IP and vote point

( and - -- position and noise of Yea vote)

( and - -- position and noise of Nay vote)

  • IP’s of legislators and (Yay/Nay votes of) legislation are parameters to estimate

142 of 239

Positioning: Legislation

The overarching task is predicting the position of an actor with respect to particular piece of legislation in (parliamentary) voting

  • The so-called roll call data

Traditional approach (not using text as data): ideal point model [Clinton et al., 2004]

  • Corresponds to a probit model with unobserved regressor xi corresponding to the i-th legislator’s ideal point and j-th bill vote-point-specific parameters

143 of 239

Positioning: Legislation

The overarching task is predicting the position of an actor with respect to particular piece of legislation in (parliamentary) voting

  • The so-called roll call data

Traditional approach (not using text as data): ideal point model [Clinton et al., 2004]

  • Parameters (IPs) and Yea/Nay positions of legislations estimated by maximizing the likelihood of the observed votes (y variables):

144 of 239

Positioning: Legislation

The overarching task is predicting the position of an actor with respect to particular piece of legislation in (parliamentary) voting

  • The so-called roll call data

Traditional approach (not using text as data): ideal point model [Clinton et al., 2004]

  • Parameter estimation: Markov Chain Monte Carlo simulation

  • Base model: the legislators are mutually independent and so are the roll calls
  • It is possible to augment the model for additional effects
    • Party effects [Clinton and Mierowitz, 2001]
    • Vote trading and cue taking: making utilities of legislators inter-dependent
    • Additional information on legislators and legislation, coming from text

145 of 239

Predicting Legislative Roll Calls from Text

Text-based extension of the ideal point model (IPM) [Clinton et al., 2004]

Fundamental limitation of IPM as a predictive model:

  • It is a model of the vote itself
  • Can be used to fill in missing votes on past legislation
  • Cannot be used to predict votes on future legislation

Connect the votes of the legislator with the text of the bill [Gerrish & Blei, 2011]

  • Ideal point topic model (IPTM)
  • Based on the text of the future bill, one can predict legislator’s vote

146 of 239

Ideal Point Topic Model (Gerrish & Blei, ICML 11)

Ideal Point Topic Model: Effectively combines the IPM and topic modeling

Simplified view of IPM (LR with random effects)

Ideal point of the legislator u (latent)

Vote of the legislator u for bill d (obs.)

Bill position (polarity) bd

Bill difficulty (popularity) ad

  • Votes (Vud) depend on bill variables (Ad, Bd) and legislator’s ideal point (Xu)
  • Bill variables Ad, Bd depend on the bill content (i.e., topics Zdn)

147 of 239

Ideal Point Topic Model (Gerrish & Blei, ICML 11)

Ideal Point Topic Model: More complex variant of the supervised LDA [Blei & McAuliffe, 2008]

  • In sLDA, the response variables (labels) are observable
  • Here, they are latent bill variables: Ad, Bd

  • Estimation: variational inference (direct computation of the posterior intractable)
  • Inference (for new bills): per-word topic distribution informs ad, and bd, which together with xu determine vud

148 of 239

Votes, Bill Text, and...

Ideal Point Model [Clinton et al., 2004]

  • Data: just the votes

Ideal Point Topic Model [Gerrish & Blei, 2011]

  • Data: votes + bill text

What additional data could improve roll call predictions?

  • Interactions between legislators (speeches) [Thomas et al., 06]
  • Additional information on the legislators (e.g., party) [Kornilova et al., 18]

149 of 239

Positioning and Legislation: (Dis)Agreement

[Thomas et al., 2006] “Get out the vote: Determining support or opposition from Congressional floor-debate transcripts”

  • Goal: predict if a speech supports or opposes a legislative proposal
  • Idea: besides comparing the speech and the proposal, exploit the discourse structure of parliamentary speeches
    • Speeches relate to (i.e., reply to) other speeches
    • By estimating (dis)agreements between speeches, one can better estimate alignment with legislation
    • Essentially a sentiment analysis tasks: positive or negative sentiment towards a proposal

150 of 239

Legislative positioning via (dis)agreement [Thomas et al., 2006]

A debate is given as a sequence of speeches: s1, s2, …, sn

Approach:

  1. Isolation-based speech classification
    • Linear SVM with unigram features trained for binary (sentiment) classification
  2. Constraints (links) between speeches
    • A weighted graph is induced with speeches as nodes and weights indicating the constraints
    • Same speaker constraints: weight of the edge connecting speeches of the same speaker is infinity
    • Different speaker agreement: (1) identifying references (2) decide: agreement or disagreement
      1. Reference identification: simply, by name
      2. Agreement classification: SVM classifier, with the reference context as input
  3. Global optimization, with the following objective (efficient solving by min-cut):

151 of 239

Predicting roll calls with embeddings

Discriminative models for predicting roll call votes

[Kraft et al., ACL 16]

  • Bilinear model: multi-dimensional position vectors for legislators (vc)
  • Bill representation: linear transformation (W) of the word embedding average

  • More parameters to represent the legislator
    • IPM and IPTM used only a single score
    • vc -- 10-dimensional vectors

152 of 239

Ideal points vs. ideal vectors

IPM/IPTM vs. Ideal vectors

Image from [Gerrish & Blei, 2011] Image from [Kraft et al., 2016]

1-dimensional positions 2D (PCA proj. of 10-dim. vectors)

153 of 239

Ideal vectors

Source: [Kraft et al., ACL 16]

154 of 239

Predicting roll calls with embeddings

Discriminative models for predicting roll call votes

[Kornilova et al., ACL 18]

  • Extension of the bilinear model of [Kraft et al., ACL 16] with party information
  • Base model, without party information

  • Party information: percentage of R/D sponsors of the bill, pr and pd
  • Bill embedding, vB, is an encoding obtained with CNN (not word emb. avg)

155 of 239

Party Matters

Accuracy results show that

  • Better text representations (CNN as opposed to averaging pre-trained word embeddings)
  • Meta-data, i.e., sponsor information

provide a better bill representation

Source: [Kornilova et al., ACL 18]

156 of 239

Are Votes Reliable Labels? [Abercombie & Navarro, 2018; 2019]

“Rebel speeches” are only a small minority, so in general you can pair text and votes, but be aware that these are not gold labels.

157 of 239

Positioning: Topics

Topic-based positioning refers to a broad body of work in which a stance or position of the political actor is to be determined for one or more topics

Concrete task definitions depend on several factors:

  • Topic definition: explicit, implicit?
  • Corpora: with or without topic annotations?
  • Multiple topics: detecting positions independently or jointly for multiple topics?

Pipeline approach: first topic detection, then per-topic positioning

[Menini et al., 2016; Nanni et al., 2016; Menini et al., 2017]

Joint approach: joint detection of (multiple) topics and positions

[Lin et al., 2008; Fang et al., 2012; Trabelsi et al., 2014; Thonet et al., 2016]

158 of 239

Topical positioning: pipelined approaches

Unlike the joint models, which induce the topics and positions/viewpoints jointly, “pipeline approaches” determine positions or (dis)agreements for known topics

Topics are either pre-defined [Nanni et al., 2016] or induced in a pre-processing step [Menini et al, 2016, 2017]

[Nanni et al., 2016] Topic-Based Analysis of Political Position in US Electoral Campaigns

  • Predefined topics: top-level topics from the CMP
  • For each topic, create a topic-filtered version of the corpus (i.e., for each manifesto, keep only the sentences labeled with that topic)
  • Used WordFish [Slapin & Proksch, 2008] to induce positions for each topic

159 of 239

Topical positioning: pipelined approaches

[Menini et al., EMNLP 17] Topic-Based (Dis)Agreement in US Electoral Manifestos

Pipeline:

  1. Supervised coarse-grained domain (macro topics) classification [Zirn et al., 2017]

For each domain, unsupervised topic induction (but not with topic models!):

2. Key concept extraction, rule-based [Moretti et al., 2015]

3. Key concept clustering (a cluster is a topic)

  • Soft graph-based clustering, similarities based on word embeddings

4. For each topic (cluster of keywords) t

5. Couple sentences from t of Democratic Manifestos with Republican t sentences 6. Annotate topical agreement/disagreement and train a supervised classifier

160 of 239

Topical positioning: pipeline [Menini et al., EMNLP 17]

161 of 239

Jointly detecting topics and positions

The so-called Topic Models for Viewpoint Extraction:

  • The generated text is a result of (1) the topics the author chooses to talk about and (2) positions (typically pro and con) the author holds
  • Evaluation (as always with topic models) is an issue: perplexity-based measures

Joint Topic and Perspective Model (JTPM) [Lin et al., 2008]

  • Detection of one global (ideological?) position and topics

Joint Topic Viewpoint Model (JTVM) [Trabelsi et al., 2014]

  • Detection of topics and viewpoints towards each of the topics

Viewpoint and Opinion Discovery Unification Model (VODUM) [Thonet et al., 2016]

  • Joint topic and viewpoint detection, with two types of observables (words)

162 of 239

Joint Topic and Perspective Model [Lin et al., 2008]

  • V viewpoints, each with its own parameter vector (sampled from a multivariate normal distribution)
  • Topic 𝜏: sampled from a multivariate normal distribution
  • Words sampled from a multinomial distribution 𝛽V over the vocabulary V
    • Not a latent variable, deterministically computed from topic and viewpoint vectors

163 of 239

Joint Topic and Perspective Model [Lin et al., 2008]

  • Document’s ideological position is given with the Bernoulli variable Pd

Limitations of JTPM

  • Models only a single topic 𝜏: clean “single-topic corpora” hard to obtain
  • Models only two opposite (ideological?) positions (similar to 1-D text scaling)
    • Pd is a Bernoulli variable
    • Documents need to be divided into the two “ideological” classes (i.e., annotated)

164 of 239

Joint Topic and Perspective Model [Lin et al., 2008]

Red: Israeli authors

Blue: Palestinian authors

Red: Democratic authors

Blue: Republican authors

Images from [Lin et al., 2008]

165 of 239

Joint Topic Viewpoint Model [Trabelsi et al., 2014]

Extension of LDA to incorporate viewpoints

  • K topics, each of which is a multinomial distribution over L viewpoints
  • Each viewpoint (of each topic) is a multinomial distribution over terms

One can compare the positions by:

  • Selecting some topic k
  • Compare document-specific viewpoint multinomials 𝜓dk for the topic k

166 of 239

Joint Topic Viewpoint Model [Trabelsi et al., 2014]

Generative story

  • For each topic k and each viewpoint l draw (Dir(𝛽)) a multinomial over the vocabulary V
  • For each document d
    • Draw a multinomial topic mixture (sample from Dir(𝛼))
    • For each topic k, draw a (document-specific) viewpoint mixture multinomial (Dir(𝛾))
    • For each word (i.e., position)
      • Sample a topic zdn
      • From zdn sample a viewpoint vdn
      • Sample a word from the multinomial distribution over terms for the topic zdn and viewpoint vdn

167 of 239

Agenda

Text

RQs

Tasks

Topic Detection

Positioning

Scaling

Multilinguality / CL Transfer

168 of 239

Political Text Scaling

Arguably the most prominent task in text-as-data PolSci community

Task definition: Given a set of political actors and for each of them an aggregated collection of (political) text they produce, predict (numerically) their (typically relative) positions on a 1-dimensional scale

  • In most cases, the aim is the left-to-right ideological scaling, even though...

169 of 239

Political Text Scaling

Task definition: Given a set of political actors and for each of them an aggregated collection of (political) text, predict (numerically) their (typically relative) positions on a 1-dimensional scale (i.e., a regression task)

  • In most cases, the aim is the left-to-right ideological scaling, even though...

“substantive content of a “left-right” dimension varies significantly across different contexts, to such an extent that “it may be impossible for any single scale to measure this dimension in a manner that can be used for reliable or meaningful cross-national comparison” [Benoit & Laver, 2006]

  • (Weakly) supervised scaling: texts annotated with ideological scores
  • Unsupervised scaling: no annotations, just text collections

170 of 239

Same Politician or Not?

“It is unbearable, when refugees homes are attacked, when people try to make radical speeches. All those who come to us have the right to be treated correctly, to have a proper asylum procedure. That's our rule of law and we are proud of it.”

”Sometimes I feel ashamed, when I see how the question of refugees is being discussed in our countries, when just 30,000 or 40,000 refugees are arriving in a country of 82 million while here you have 120,000 refugees in a town of 100,000 inhabitants, and when i see how they are being taken care of, I have to take my hat off to them.”

171 of 239

Same Politician or Not?

“It is unbearable, when refugees homes are attacked, when people try to make radical speeches. All those who come to us have the right to be treated correctly, to have a proper asylum procedure. That's our rule of law and we are proud of it.”

”Sometimes I feel ashamed, when I see how the question of refugees is being discussed in our countries, when just 30,000 or 40,000 refugees are arriving in a country of 82 million while here you have 120,000 refugees in a town of 100,000 inhabitants, and when i see how they are being taken care of, I have to take my hat off to them.”

172 of 239

Same Politician or Not?

“It is unbearable, when refugees homes are attacked, when people try to make radical speeches. All those who come to us have the right to be treated correctly, to have a proper asylum procedure. That's our rule of law and we are proud of it.”

”Sometimes I feel ashamed, when I see how the question of refugees is being discussed in our countries, when just 30,000 or 40,000 refugees are arriving in a country of 82 million while here you have 120,000 refugees in a town of 100,000 inhabitants, and when i see how they are being taken care of, I have to take my hat off to them.”

173 of 239

Scaling Methods

Term-based scaling

  1. Wordscores [Benoit & Laver, 2003]
  2. Supervised: requires positions for some number of texts
  3. Word positions computed based on occurrences in labeled documents
  4. Combines word positions to infer positions of unlabeled documents

2. Wordfish [Slapin & Proksch, 2008; Proksch & Slapin, 2010]

  • Unsupervised: only text collection as input
  • Counts of words: a Poisson distribution, determined with (prior) word and party effects and the product of word position and text position
  • EM-like algorithm for parameter estimation

174 of 239

Scaling Methods

Semantic scaling

3. SemScale [Glavaš et al., 2017; Nanni et al., 2019]

  • Unsupervised: only text collection as input
  • Relies semantic representations of words, i.e., word embeddings
  • Positions induced via semantic similarity and graph-based label propagation

4. Party2Vec [Rheault & Cochrane, 2019]

  • Unsupervised: only text as input
  • Introduce special party tokens (e.g., “Dem_92”) and insert them into party texts
  • Train word embeddings with the SkipGram model [Mikolov et al., 2013]
  • For scaling: project party embeddings into a single score (1-dim with PCA)

175 of 239

Wordscores [Laver et al., 2003]

Image from [Laver et al., 2003]

176 of 239

Wordscores [Laver et al., 2003]

Generate wordscores from reference texts

  • Reference texts r have a priori scores Sr
  • P(w|r) = count(w, r) / length(r): unigram LM probability of w in r
  • Compute posteriors P(r|w): probability of reading text r if seeing word w

  • Wordscore Sw is the posterior-weighted sum of positions of reference texts

177 of 239

Wordscores [Laver et al., 2003]

Computing the scores of unlabeled texts

  • Wordscore Sr of the new text is then simply a weighted sum of wordscores Sw

  • Transform the scores of the virgin texts so that they have same dispersion metric as the reference texts

178 of 239

Wordscores [Laver et al., 2003]

Computing the scores of unlabeled texts

  • Wordscore Sr of the new text is then simply a weighted sum of wordscores Sw

179 of 239

Wordscores [Laver et al., 2003]

Source: Laver et al. 2003

180 of 239

Wordscores [Laver et al., 2003]

Shortcomings of Wordscores:

  • Needs reference texts, labeled with position scores

  • Vocabulary determined by the reference texts
    • Any word not in reference texts has no effect on the position of the new document

  • Global frequency of words (prior “word effects”) are ignored
    • Stopwords pull document scores towards the “mean” score

181 of 239

Wordfish [Slapin & Proksch, 2008]

Aims to remedy for the shortcomings of Wordscores

  • Unsupervised: no need for position annotated reference texts
  • Account for word effects (i.e., global frequency of terms)

Model: A “Poisson naive Bayes

  • Word frequencies in documents (observables) sampled from Poisson distributions
  • Distributions (words appearances and freqs) independent of each other (NB)

  • Distribution parameter λ: prior word/text effects and word/text positions

182 of 239

Wordfish [Slapin & Proksch, 2008]

Model: A “Poisson naive Bayes

  • Distribution parameter λ: prior word/text effects and word/text positions

  • αi : Prior effect of the text (party/politician)
  • ψj : Prior effect of the word (e.g., “the” less relevant than “racist”)
  • βj : Word-specific weight
    • Specifies word importance for discriminating between text positions
  • ωi : Position of the text (party/politician)
    • This is what we are ultimately interested in

183 of 239

Wordfish [Slapin & Proksch, 2008]

Model: A “Poisson naive Bayes

  • Distribution parameter λ: prior word/text effects and word/text positions

Parameter estimation

  1. Initialization (starting values)
  2. αi : Logarithm of the collection-normalized document length
  3. ψj : Logarithm of the mean frequency of the word across all documents
  4. βj and ωi : based on the word-document co-occurrence matrix C (log. freqs)
    • Each element Cij corrected for αi and ψj (αi and ψj subtracted from Cij )
    • SVD of corrected C: values βj and ωi set to values from left and right singular vectors, respectively

184 of 239

Wordfish [Slapin & Proksch, 2008]

Model: A “Poisson naive Bayes

  • Distribution parameter λ: prior word/text effects and word/text positions

Parameter estimation

2. Iterative estimation (EM-like algorithm)

  1. Fix word parameters (ψj and βj) and estimate document (party) scores (αi and ωi), by maximizing log-likelihood over all vocabulary words, for each document i

185 of 239

Wordfish [Slapin & Proksch, 2008]

Model: A “Poisson naive Bayes

  • Distribution parameter λ: prior word/text effects and word/text positions

Parameter estimation

2. Iterative estimation (EM-like algorithm)

b. Fix party parameters (αi and ωi) and estimate word scores (ψj and βj), by

maximizing log-likelihood over all documents, for each word j

186 of 239

Wordfish: alternative optimization

Model: A “Poisson naive Bayes

  • Distribution parameter λ: prior word/text effects and word/text positions

It is unclear why Proksch & Slapin propose such this two-step EM like optimization

  • Alternative [Lowe, 2016; Glavaš et al., 2017]:
    • Minimize global negative log-likelihood for the whole collection
    • Optimize parameters via gradient descent

187 of 239

Wordfish [Slapin & Proksch, 2008]

Scale ends (parties most to the end of scale), 5th leg of the EP

Key question: what does the scale capture?

  • Left-right ideology? Pro-against EU (EU integration position)? Both?

188 of 239

Wordfish [Slapin & Proksch, 2008]

Source: Slapin & Proksch, 2008

189 of 239

Wordfish [Slapin & Proksch, 2008]

Wordfish results (positions) seem stable across languages

  • Note: not cross-lingual scaling, merely independent monolingual scalings

190 of 239

Wordfish [Slapin & Proksch, 2008]

Wordfish implementations:

R implementation (Slapin & Proksch)

http://www.wordfish.org/uploads/1/2/9/8/12985397/wordfish_1.3.r

R implementation (Will Love)

https://conjugateprior.github.io/austin/reference/wordfish.html

Python implementation (Goran Glavaš)

https://github.com/codogogo/topfish/blob/master/wordfish.py

191 of 239

Semantic scaling

Term-based scaling methods like Wordscores and Wordfish:

  • No semantics, effectively sparse text representations
  • Are inherently monolingual

The need to go beyond surface forms:

“bad hombres…”

“terrible guys…”

same position

192 of 239

SemScale [Glavaš et al., 2017]

First scaling method to use semantic text representation

Approach:

  1. Measure semantic similarity between all pairs of documents
  2. Similarity measures based on word embeddings
  3. This induces a fully-connected weighted graph
  4. Label the pair of most dissimilar texts as pivots: extreme position of the spectrum
  5. Propagate labels over the graph
  6. Using a graph-based label propagation algorithm
  7. Rescale the pivot texts

193 of 239

SemScale [Glavaš et al., 2017]

Two different unsupervised measures of semantic textual similarity

Alignment similarity

  • Greedily pairs words between two text based on their embedding similarity

Aggregation similarity

  • Similarity between aggregated documents vectors (averaged word embeddings)

194 of 239

SemScale [Glavaš et al., 2017]

Pairwise similarities induce a weighted fully-connected graph

Assumption: two most dissimilar texts (pivots) represent position extremes: −1 and 1

Inducing scores for other nodes

  • Graph-based label propagation [Zhu & Goldberg, 2009]:

195 of 239

SemScale [Glavaš et al., 2017]

Graph-based label propagation

  • Harmonic Function Label Propagation (HFLP) [Zhu & Goldberg, 2009]:

Let L = W - D be the unnormalized Laplacian of the graph

  • If we order labeled nodes before the unlabeled ones, L is partitioned as:

  • The scores of the unlabeled nodes are then obtained analytically as:

  • yl is the vector of scores of labeled nodes, in our case yl = [-1, 1]T

196 of 239

Political Text Scaling: Evaluation

Scaling algorithms produce scores in a single dimension

  • What is the meaning of that dimension?
  • A posteriori substantive analyses
    • Political scientists try to identify a meaningful dimension aligned with scaling results

Example: Wordfish for scaling EU parties from EP speeches [Proksch & Slapin, 2010]

    • Scores produced by Wordfish correlate better with positions on EU integration than with left-right ideological positions

197 of 239

Political Text Scaling: Evaluation

Scaling algorithms produce scores in a single dimension

  • What is the meaning of that dimension?

Gold position scores for the dimension of interest?

  • Chapel Hill Expert Surveys [Bakker et al., 2015]
    • Panels of PolSci experts judge ideological and EU positions of parties

  • But...are these really gold standard positions for the texts that algorithms scale?
    • Experts do not rate parties after reading all of their speeches
    • Rather rate them based on their prior knowledge
    • Expert’s political biases encoded in the grades

198 of 239

Political Text Scaling: Evaluation

Task definition: Scaling texts for two different political dimensions: (i) left-to-right ideological position; (ii) position on European integration

Evaluation metrics:

  • Pairwise Accuracy (PA), i.e., the percentage of pairs with parties in the same order as in the gold standard
  • Spearman and Pearson correlation between the two sets of positions

Baselines:

  • Term-based WordFish model (monolingual setting only)
  • Random positioning (sanity check)

199 of 239

Political Text Scaling: Evaluation

“Errors”:

  • Due to the limitations of the scaling model or
  • Due to the texts being results of positions over multiple (not one!) dimensions?

Correlation of produced positions with

Chapel Hill left-right ideology scores

Correlation of produced positions with

Chapel Hill EU integration positions

200 of 239

SemScale: Demo, Tool and Appendix

http://tools.dws.informatik.uni-mannheim.de/semScale

201 of 239

SemScale: Demo, Tool and Appendix

https://github.com/umanlp/SemScale

https://federiconanni.com/semantic-scaling/

202 of 239

Topical Scaling

IF we want to interpret the positions as relating to a certain dimension or topic, we need to filter out the content irrelevant to that dimension

  • Topical classification as a preprocessing step?

TopFish [Nanni et al., 2016]

  • Topical classification of US electoral speeches
  • Per-topic scaling with Wordfish

All topics

External relations

Welfare & Quality of Life

203 of 239

Party2Vec [Rheault & Cochrane, 2019]

Simple extension of the Skip-Gram model [Mikolov et al., 2013]

  • An artificial “document identifier” token added to every context used for training the Skip-Gram model
  • Embeddings obtained for those tokens are party/document representations

204 of 239

Party2Vec [Rheault & Cochrane, 2019]

Simple extension of the Skip-Gram model [Mikolov et al., 2013]

  • An artificial “document identifier” token added to every training context
  • Embeddings obtained for those tokens are party/document representations
    • “Party embeddings” are then multidimensional (like word embeddings, e.g., 300-dim)
    • Projected to lower-dimensional space with PCA: for scaling, only the first primary component

Image from [Rheault & Cochrane, 19]:

2D PCA projection of party vectors

205 of 239

Agenda

Text

RQs

Tasks

Topic Detection

Positioning

Scaling

Multilinguality / CL Transfer

206 of 239

Multilinguality

Political analysts compare actors from different countries

  • Most content in native languages

Crossing the language chasm

Old paradigm:

  • Language-specific NLP models
  • Language-specific feature computation (i.e., preprocessing)

New paradigm:

  • Representation learning: inputs are semantic vectors (embeddings)
  • Multilingual / cross-lingual rep. learning

207 of 239

Crossing the Language Chasm

  1. Full-Blown MT (SMT or NMT)
  2. Parallel data needed, critical for under-resourced languages
  3. Translate everything from the target language to the source language
  4. Unsupervised NMT?

208 of 239

Crossing the Language Chasm

  • Full-Blown MT (SMT or NMT)
  • Parallel data needed, critical for under-resourced languages
  • Translate everything from the target language to the source language
  • Unsupervised NMT?

2. Multilingual KBs

  • Texts represented using entities from a multilingual KB
  • Same entity ID for same concepts across languages
  • Issues: coverage, entity linking

209 of 239

Crossing the Language Chasm

3. Multilingual / Cross-lingual

representations of meaning

  • Word-level
    • Cross-lingual word embeddings
    • Words with similar meaning across languages have similar vectors

  • Sentence- / paragraph-level
    • Most recent developments
    • Multilingual unsupervised pre-training

[Lample & Conneau, ‘19; Devlin et al., 2019]

Image from [Luong et al., 2015]

210 of 239

CLWE: post-hoc alignment

Monolingual embeddings independently constructed

Post-hoc aligning monolingual spaces

X is distributional space of L1, Y of L2

  • We are looking for functions f and g that produce a meaningful bilingual embedding space f(X)g(Y)

Image from [Conneau et al., 2018]

211 of 239

CLWE: PolSci applications

Cross-lingual word embeddings (CLWEs) allow for

  • Semantic comparison of texts in different languages
  • Cross-lingual transfer of NLP models
    • Resource-rich training language, resource-poor target language

Cross-lingual text scaling with SemScale [Glavaš et al., 2017a]

  • Nothing changes for the SemScale algorithm
  • Semantic similarities between texts based on vectors from the CLWE space

Cross-lingual topic classification of manifestos [Glavaš et al., 2017b]

  • A lot of topic-annotated data in EN and DE, much less in other languages

212 of 239

Cross-lingual manifesto topic classification [Glavaš et al., 2017b]

Simple classification model:

  • Embeddings (from a CLWE space) as input
  • Convolutional network as the encoder
  • Softmax classifier

Manifesto top-level topics: 7 labels

  • Train/test sets in 4 langs: EN, DE, FR, IT
  • EN & DE datasets much larger than FR & IT

Two models

  • Mono-L: training data of one language
  • Cross-L: concatenation of all training data

https://github.com/codogogo/topfish

213 of 239

Conclusion

Text

RQs

Tasks

Topic Detection

Positioning

Scaling

Multilinguality / CL Transfer

214 of 239

Conclusion

Computational analysis of political text: a vibrant interdisciplinary research area

  • Natural language processing
  • Political science

Despite interdisciplinary nature of the work

  • Most research efforts from one OR the other community
  • Unexpectedly disjunct communities

Different lines of work on similar and closely related tasks

  • Positioning vs. scaling

Bridging efforts between communities paramount for more interdisciplinary work

  • This tutorial is our small contribution towards that goal

215 of 239

Thanks!

Goran

Federico

Simone

216 of 239

Interested in Computational Political Science?

We are hiring, get in touch!

{federico,goran,simone}@informatik.uni-mannheim.de

217 of 239

218 of 239

3 Days Text Scaling Hackathon (Dec. 2017)

219 of 239

3 Days Text Scaling Hackathon (Dec. 2017)

Supported by Villa Vigoni, the German-Italian Centre for European Excellence and by DFG.

23 young researchers from political science, computational social science and NLP.

220 of 239

3 Days Text Scaling Hackathon (Dec. 2017)

Participants from:

Mannheim, Bruno Kessler Found., Unitelma, Sheffield, Duisburg-Essen, GESIS, Scuola Normale Superiore, EUI, Scuola Superiore Sant’Anna, Bocconi, Liepzig, Zagreb, LSE, Alan Turing Institute, Edinburgh, CEU, Toronto.

221 of 239

Joint Work With:

Goran Glavas

Simone Paolo Ponzetto

Sara Tonelli

Nicolò Conti

222 of 239

What is a Hackathon?

A coding-intensive collaborative workshop.

223 of 239

What is a Hackathon?

Why?

A coding-intensive collaborative workshop.

To boost collaboration across disciplines.

224 of 239

What is a Hackathon?

Why?

Shared-Task?

A coding-intensive collaborative workshop.

To boost collaboration across disciplines.

Participants were divided in 5 groups and had to work together towards a specific goal.

225 of 239

European-Integration Scaling

The task: develop a method for scaling text on the EU integration dimension. We provide participants with:

226 of 239

European-Integration Scaling

The task: develop a method for scaling text on the EU integration dimension. We provide participants with:

  1. Manually translated speeches made by 25 parties from France, Germany, Italy, Spain and UK at the EuroParl (1999 - 2017).

227 of 239

European-Integration Scaling

The task: develop a method for scaling text on the EU integration dimension. We provide participants with:

  • Manually translated speeches made by 25 parties from France, Germany, Italy, Spain and UK at the EuroParl (1999 - 2017).
  • A gold standard (Chapel Hill) of EU integration party-positions (leg: 5, 7, 8)

228 of 239

European-Integration Scaling

The task: develop a method for scaling text on the EU integration dimension. We provide participants with:

  • Manually translated speeches made by 25 parties from France, Germany, Italy, Spain and UK at the EuroParl (1999 - 2017).
  • A gold standard (Chapel Hill) of EU integration party-positions (leg: 5, 7, 8)
  • On the last evening, a test set (leg: 6)

229 of 239

European-Integration Scaling

The output: party positions for 6th legislation regarding European integration (between 0: strongly against and 1: strongly in favour).

230 of 239

European-Integration Scaling

The output: party positions for 6th legislation regarding European integration (between 0: strongly against and 1: strongly in favour).

All data available at: https://federiconanni.com/hack-vigoni/

231 of 239

What Participants Could Not Do

  1. Find online the gold standard and predict based on that
  2. Use external knowledge on the party to scale it (the task is text scaling)

232 of 239

How Did It Go?

233 of 239

How Did It Go?

234 of 239

How Did It Go?

235 of 239

How Did It Go?

236 of 239

Core Components of All Approaches

  1. An initial filtering strategy (using a dictionary or a manually created list)
  2. A text similarity approach, based on TF-IDF or word embeddings
  3. A supervised scaling function (SVM regression model, canonical correlation analysis, etc.)

237 of 239

Results

238 of 239

Results

239 of 239

Results