1 of 68

The importance of entities

Meltwater Budapest, April 2016

Babak Rasolzadeh, Director of Data Science Research

2 of 68

  1. Company background
  2. Data Science @ Meltwater
  3. Challenges with NLP at Large scale
  4. Entities, entities, entities
    1. Social NER
    2. ELS
    3. Knowledge Graph

3 of 68

What is Meltwater?

  • A business intelligence company → Providing insights from data outside the firewall (news, blogs, social media, etc.)
  • Born in Oslo, in 2001.
  • Founder and CEO: Jorn Lyssegen
  • www.meltwater.com

  • 30K+ clients all over the World.
  • 1000+ employees
  • 60+ offices around the world, mostly sale.
  • Tech offices: USA, Germany, Sweden, Hungary, India.

3

4 of 68

Why?

own brand

competitors

leads

partners

product reviews

own industry

4

5 of 68

What?

Uses Meltwater to find out about new instances of vandalism and break-ins. Often, the victim is in need of services

Uses Meltwater to help determine how public perception of certain ingredient chemicals will influence adoption & sales

Uses Meltwater to be alerted of when certain patent will expire in target markets

Uses Meltwater to monitor the performance and popularity of news anchors and programs

Uses Meltwater social listening to estimate and prevent infrastructure attacks

5

6 of 68

How?

6

7 of 68

NLP & Data Science at Meltwater

Unstructured�Document Stream

Pipeline

Enrichments

Search

/Storage

Enriched Documents

High Performance�Indexes

Processing�Services

API Layer

APPS

Backup Storage

Raw Documents

15 supported languages in pipeline

(EN, DE, SV, NO, FI, ZH, JP, FR, ES, DA, NL, PT, AR, IT, HI)

Typical enrichments

    • Sentiment analysis
    • Thematic analysis
    • Categorization
    • Keyphrase extraction
    • Named Entity Recognition
    • Named Entity Disambiguation

8 of 68

What other than NLP?

  • Recommendation Engines

DOC3

DOC3

DOC3

DOC3

DOC3

DOC8

Realtime recommender

engine

  • Correlation and predictive pattern recognition
  • Word2vec techniques

concept 3

concept 1

concept 2

“British American Tobacco" or "British American Tobbaco" or (BAT near tobacco) or "英美煙草" or (("Lucky Strike" or "Dunhill" or "Pall Mall") near/15 cigarette*)

8

9 of 68

Machine Learning Terminology

9

10 of 68

Challenges with Data Science (NLP) at scale

  • High DPS (~2000) and a lot to do! (tokenization, lemmatization, stemming, POS tagging, categorization, sentiment, NER, ...) with racing conditions!
  • Training data labelling is costly! x15
  • Contextual information expensive (computationally).
  • Noise, missing data, variation (e.g. slang), data types, ...

Pipeline

Enrichments

SV

EN

DE

POS

NER

10

11 of 68

Knowledge Base Strategy

Entities, entities, entities

don - July 2015

12 of 68

Knowledge Base Strategy

What are Named Entities (NE)?

  • Non-linguistic definition
    • Referable entities
    • Usually Proper Names
    • Single or multi-word

I know this man. He might be Charles.

He lives in Stockholm. He is Swedish.

12

13 of 68

Knowledge Base Strategy

What is Named Entity Recognition (NER)?

  1. Extracting NEs from a text.
  2. Categorizing NEs from a set of predefined categories.

John lives in Stockholm. He works at Ericsson.

Categories of {PER, LOC, ORG, MISC, PROD}

13

14 of 68

Knowledge Base Strategy

What NER is not?

  • NER is not event recognition.
  • NER recognises entities in text, and classifies them in some way, but it does not create templates, nor does it perform co-reference or entity linking.
  • NER is not just matching text strings with pre-defined lists of names. It only recognises entities which are being used as entities in a given context.

(i.e. not easy!)

14

15 of 68

Why NER?

  • Key part of Information Extraction system
  • Robust handling of proper names essential for many applications
  • Pre-processing for different classification levels
  • Information filtering
  • Information linking
  • Entity level sentiment
  • Knowledge graph

15

16 of 68

Knowledge Base Strategy

Why NER?

16

17 of 68

Knowledge Base Strategy

Why NER?

Pepsi spooks Coke with

this Halloween themed ad.

Entity specific sentiment analysis a.k.a ELS

17

18 of 68

Knowledge Base Strategy

So what about Social…?

19 of 68

How to do NER? (state-of-the-art)

Supervised Learning

  • Hidden Markov Model (HMM) Freitag and Mccallum, 1999; Leek, 1997.

  • Conditional Markov Model (CMM) Borthwick, 1999; McCallum et al., 2000.

  • Conditional Random Field (CRF) Lafferty, 2001; Ratinov and Roth, 2009.

19

20 of 68

Training data

  • Ground truth data collection for NER is very expensive
  • Solutions:
    • Automatic NER annotation using Wikipedia data
    • Applying Latent Dirichlet Analysis (LDA) based NER detection using Gazetteer data.

20

21 of 68

NER pipeline

21

22 of 68

Gazetteers help

Extensive lists of names for a specific category

  • PER
    • First names (male-female) and surnames, their frequency
  • LOC
    • Cities, Countries
    • Population
  • ORG
    • Name of companies from Yellow pages.

Disadvantages

    • Difficult to create and maintain (or expensive if commercial)
    • Usefulness varies depending on category
    • Ambiguity
    • Words occur in more lists of different types (PER, LOC, FAC,...)

22

23 of 68

Brown clustering - motivation

Let’s say we want to estimate the likelihood of the bi-gram "to Shanghai", without having seen this in a training set.

The system can obtain a good estimate if it can cluster "Shanghai" with other city names (like “London”, “Beijing”), then make its estimate based on the likelihood of phrases such as "to London", "to Beijing" and "to Denver"

23

24 of 68

Brown clustering (1)

  • Proposed by Brown et al. (1992) (a.k.a “IBM clustering”)
  • Hierarchical class-based labeling method.
  • Bottom-up
  • Unsupervised learning
    • Doesn't need labeled data but rather large set of raw text.
  • Greedy technique to maximize bi-gram MI.
  • Merge words by contextual similarity.

(

)

24

25 of 68

Brown clustering (2)

  • Large amount of data
    • Similar words appear in similar contexts.
    • More precisely: similar words have similar distribution of words to their immediate left and right.
  • Example: “the” and “a” both are determinant.
    • Frequency of immediate words on their left and right:

25

26 of 68

Brown clustering (3)

26

27 of 68

Hmm...easy?

  • What are the challenges in real applications?
  • What about moving to other languages?
  • What about moving to social domain?

27

28 of 68

Disambiguation

What is the entity category of “Washington”?

28

29 of 68

Different languages

  • Tokenization
    • Chinese & Japanese: Words not separated
  • Part of speech
    • Nouns
      • English: only number inflection
      • German: number, gender and case inflection
    • Verbs
      • English: regular verb 4, irregular verb up to 8 distinct forms
      • Finnish: more than 10,000 forms
  • NER: Shape feature
    • English: Only proper nouns capitalized
    • German: All nouns capitalized

29

30 of 68

Different languages

30

31 of 68

Different languages

Studying of linguistic properties of a language is important!

31

32 of 68

Editorial vs. Social

32

33 of 68

Challenges in Social NER

  • The performance of “off-the-shelf” NER methods degrades severely when applied on Twitter data

  • Tweets
    • are short: 140 character limit.
    • cover wide range of topics.
    • are written grammatically in broken language.
    • are written fast and posted from anywhere: a lot of mis-spelling.

→ a solution which considers social characteristics of text

33

34 of 68

Challenges in Social NER

Examples of noisy data

  • Jaguar's gonna like this episode of #MadMen even less than last week's, I bet.
  • Dane Bowers is in Asda I cant believe.it luckiest girl in the world omf i cant believe it omg
  • A feel good story RT @DailyBreezeNews: Santa Claus arrives by helicopter at LAX to greet local school

34

35 of 68

Solution (1)

Adapting existing features to social properties

(POS tagger of editorial NER performs really poor

when it comes to social documents.)

35

36 of 68

Solution (2)

Weight (importance) of each CRF feature

36

37 of 68

Results

  • Training Data
    • ~76K tweets labeled by human annotator.
  • Inter agreement of two annotators.
  • Test Data
    • ~9.1K tweets labeled by human annotator.
  • Improvement compared state-of-the-art method

Ritter, A. et al. Named entity recognition in tweets: An experimental study. EMNLP ’11, pages 1524–1534.

37

38 of 68

Knowledge Base Strategy

What about sentiment….?

39 of 68

Document Level Sentiment - how it works

Inter-annotator agreement ~80%*

40 of 68

Document Level Sentiment - how it works

Machine Learning Magic

Supervised learning

Naive bayes - BernoulliNB, GaussianNB, MultinomialNB

Support Vector Machines - LinearSVM, RbfSVM

Maximum Entropy Model - GIS, IIS, MEGAM, TADM

MLP - RecurrentNN

41 of 68

Document Level Sentiment - how it works

Machine Learning Magic

42 of 68

Document Level Sentiment - current status

~60-70% (depending on language)

Not too terrible, considering that human performance is at best ~80%...

...but why is it so hard?

43 of 68

Document Level Sentiment - how it’s used

44 of 68

Document Level Sentiment - how it’s used

45 of 68

Document Level Sentiment - the problem

46 of 68

Document Level Sentiment - the problem

Negative

Neutral

47 of 68

Document Level Sentiment - the problem

Those numbers underline a growing gap between McDonald's and today's fast-food customers. It will only get wider with another year's worth of the same uninspired fare that has made McDonald's customers easy pickings for Panera Bread, Chick-fil-A, Chipotle Mexican Grill and others.

Negative

Positive

Does not make sense for our industry!

48 of 68

Knowledge Base Strategy

Entity Level Sentiment (ELS)

49 of 68

Entity Level Sentiment - motivation

  • DLS imprecise and wrong for our customers
  • Entities are of main importance for our customers
  • We already have NER (Named Entity Recognition) technology

Idea:

Identify the sentiment towards each particular entity in a text!

50 of 68

Entity Level Sentiment - how it works

NER

BMW: Positive

Mercedes: Neutral

Toyota: Negative

51 of 68

Entity Level Sentiment - how it works

Entity1: Positive

Entity2: Neutral

Entity3: Negative

E1:Positive

E2: Neutral

E3: Negative

E1:Positive

E2: Neutral

E3: Negative

E1:Positive

E2: Neutral

E3: Negative

52 of 68

Entity Level Sentiment - how it works

Entity1: Positive

Entity2: Neutral

Entity3: Negative

NER

53 of 68

Entity Level Sentiment - use case

54 of 68

Entity Level Sentiment - current status

  • ELS is considered a very tough problem in NLP/ML
  • The accuracy of state-of-the-art ELS is currently very low

(~45%)

55 of 68

Knowledge Base Strategy

The holy grail : The Graph Knowledge Base

don - July 2015

56 of 68

Entities + Relationships

As the types of entities and their relationships grows so does

the capacity to infer insights

that depend on connectivity

and eventually one can

answer questions that

would otherwise not be

possible with only separate datasets!

56

57 of 68

KB Architecture

57

Unstructured�Document Stream

Pipeline

Enrichments

Graph Search

Enriched Documents

High Performance�Indexes

Processing�Services

API Layer

Knowledge�Base

(Graph)

I/O

External Data Providers

Updates/subscriptions

Lookups

APPS

Backup Storage

Raw Documents

58 of 68

Knowledge Base Strategy

Why is it hard?

59 of 68

Composing the KB

59

60 of 68

Data Acquisition trade-offs

High volume

High quality

Cheap

Manual data acquisition

Special crawlers,

Smart algorithms

Acquisitions, partnerships

low quality

expensive

low

volume

60

61 of 68

Composing the KB - Scalability

61

62 of 68

Scalability Requirements - next steps

Companies ~ 100 million worldwide

People ~ 500 million (including media influencers)

Products ~ 500 million

~1 billion entities all the connections

between them

billions of nodes, trillions of edges!

62

63 of 68

Composing the KB - New features

63

64 of 68

Improve entity search - company NED

64

65 of 68

Improve entity search - person NED

Robert Gates�22nd Secretary of Defense

William Henry Gates III�former CEO & cofounder of Microsoft

“Who is Mr. Gates?”

65

66 of 68

Emerging competition

66

67 of 68

Map influencer network

influencer score ~ eg. PageRank

67

68 of 68

Suggested read

  • Ratinov 2009 (challenges in NER): paper.
  • ArkCMU (social): paper, code.
  • Ritter et al (social): paper, code.
  • Stanford NLP NER (editorial): paper, code.
  • Brown clustering
    • brown clustering: video
    • Word Representations and N-grams: video
  • Transforming Wikipedia into Named Entity Training Data: paper.

68