1 of 26

1

From Nodes to Narratives

A Knowledge Graph-based Storytelling Approach

Mike de Kok, Youssra Rebboud, Pasquale Lisena, Raphael Troncy, Ilaria Tiddi

2 of 26

  1. Motivation

2

3 of 26

Narrative Graph

3

  • A representation that captures entities, and their interconnected links.
  • Depict the complex structure of a narrative
    • Enabling an understanding of the relationships between events and facilitating storytelling.

An intense and deadly seismic event struck offshore east of Tōhoku, Japan

4 of 26

What information the Narrative Graph does cover?

The narratives covers information about

the 4W:

4

  • What (event)
  • Who (actor)
  • Where (place)
  • When (time)

5 of 26

What information the Narrative Graph does not cover?

Lack of more semantically rich event relations

  • Prevention, Enable, Cause, Intention

Examples:

The government has implemented a series

of laws to prevent the abuse of animals.

DD Acquisition said the extension is to allow

this process to be completed.

The government passed a law to increase access

to mental health services and reduce stigma.

5

Prevention

Enable

Cause

6 of 26

Our Proposed Solution

6

V. Qualitative Analysis (QLA)

IV. Quantitative Analysis (QNA)

II. Building the Narrative Graph (BNG)

III. Knowledge graph summarization (KGS)

7 of 26

II. Building the Narrative

Graph (BNG)

7

II. (BNG)

8 of 26

Build a semantically rich Narrative Graph

8

Starting Point: ASRAEL KG [1]

  • Contains news articles with links to Wikidata events
  • Extract the 4W information from Wikidata for each event article
    • Follow the owl:sameAs (Wikidata link to event)

[1] Rudnik, C., Ehrhart, T., Ferret, O., Teyssou, D., Troncy, R., Tannier, X.: Searching news articles using an event knowledge graph leveraged by wikidata. In: Companion Proceedings of The Web Conference, 2019.

Predicate type

Predicates label

Wikidata properties

SEM properties

Who

participant, organizer, founded by

P710, P664, P112

sem:hasActor

Where

country, location, coordinate location, located in the administrative territorial entity, continent

P17, P276, P625, P131, P30

sem:hasPlace

When

point in time, start time, end time, inception, "dissolved, abolished or demolished date", publication date

P585, P580, P582, P571, P576, P577

sem:hasTime, sem:hasBeginTimeStamp, sem:hasEndTimeStamp, sem:hasTime, sem:hasTime, sem:hasTimeStamp

Location ✅

Actor ✅

Time ✅

Precise Event Spans ❌

Semantically Precise relations

II. (BNG)

9 of 26

Event Relation extraction from ASRAEL news articles

9

REBEL [2]

  • A seq2seq model, based on BART
    • Performs (sub)event- and relation extraction
    • State of the art on CoNNL04
  • Pretrained on FARO [3] dataset
    • FARO is dataset that contain semantically precise event relations mainly: cause, prevent, intend, and enable

Sentence: “A 7.3 magnitude earthquake off Japan’s Fukushima injured

dozens of people ”

Generated:� (<triplet> earthquake <subj> injured <obj> causes)

cause

[2] REBEL: Relation Extraction By End-to-end Language generation (Huguet Cabot & Navigli, Findings 2021)

II. (BNG)

10 of 26

Event Coreference resolution for ASRAEL KG

The EECEP model [4]

  • An Event Coreference Model:
  • Maps sub-events to a latent space based on BERT
  • Selectively picks latent representation of sub-events.
  • Classify the pairs to corerfering/ Non_corefering
  • Higher than threshold (0.95) → Merge

10

Example:

“A 7.3-magnitude earthquake off Japan's Fukushima injured dozens of people ….”

"The powerful quake, triggering landslides and collapsing houses, caused mayhem."

"The powerful quake, triggering landslides and collapsing houses, caused mayhem."

Window

Similarity: 0.96

[4] Held, W., Iter, D., & Jurafsky, D. (2021). Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 1406–1417). Association for Computational Linguistics.

II. (BNG)

11 of 26

Event Coreference resolution for ASRAEL KG

  • Threshold was decided empirically
    • Pick randomly clusters with at least two mentions
    • Clusters are resulted from thresholds configurations � 0.8, 0.85, 0.9.
    • Read the context/neighbouring sentences in which� the mention occurs
    • Determine if that mention should belong to the cluster
    • Decide the threshold according to the one grouping more �correfrent entities together

11

Example:

“A 7.3-magnitude earthquake off Japan's Fukushima injured dozens of people ….”

"The powerful quake, triggering landslides and collapsing houses, caused mayhem."

Similarity: 0.96

II. (BNG)

12 of 26

Final Narrative Graph

12

II. (BNG)

13 of 26

III. Knowledge graph summarization (KGS)

13

II. (KGS)

14 of 26

Relevant Information Selection

III. (KGS)

14

  • A SPARQL query has been used to extract the essential nodes for the a given article. �
    • Select the Date, location, actor of the article

(if available) .�

    • Select the mentions (events) from the sentences of the article �
  • This query prioritizes the selection of entities with higher frequencies of incoming edges.

15 of 26

Text Generation from KG

III. (KGS)

15

  • The JointGT [5] system trains a model that is built on pre-trained encoder-decoder based on T5 on the WebNLG dataset to extract narratives from a given KG.
    • Given a triple T and a ground-truth label L , JointGT is optimized to predict the narratives L given the triple T .

[5] P. Ke, H. Ji, Y. Ran, X. Cui, L. Wang, L. Song, X. Zhu, M. Huang, JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, ACL, 2021, pp. 2526–2538.

Dataset

Triples

Label

WebNLG

(3Arena, owner, Live Nation Entertainment), (Dublin, is part of, Republic of Ireland), (3Arena, location, Dublin),

(Dublin, is part of, Leinster)

(The owner of 3Arena, Dublin, Leinster, Republic of Ireland is Live Nation Entertainment), (Dublin is part of Leinster and a city in the Republic of Ireland. Dublin is also home to the 3Arena which is currently owned by Live Nation Entertainment)

16 of 26

Text Generation from KG

III. (KGS)

16

Contribution: Enhancement of (WebNLG) with the FARO Dataset

Dataset

Triples

Label

WebNLG

(3Arena, owner, Live Nation Entertainment), (Dublin, is part of, Republic of Ireland), (3Arena, location, Dublin),

(Dublin, is part of, Leinster)

(The owner of 3Arena, Dublin, Leinster, Republic of Ireland is Live Nation Entertainment), (Dublin is part of Leinster and a city in the Republic of Ireland. Dublin is also home to the 3Arena which is currently owned by Live Nation Entertainment)

FARO

(Demand, cause, benefited)

The company benefited from continued strong demand and higher selling prices for titanium dioxide, a white pigment used in paints, paper and plastics.

Dataset

Train

Val

Test

WebNLG

12876

1619

1600

FARO

1800

201

108

Combined

14676

1820

1708

17 of 26

IV. Quantitative Analysis (QNA)

17

II. (QNA)

18 of 26

Quantitative analysis

IV. (QNA)

18

Model

Dataset

BLEU

METEOR

ROUGE

Base (WebNLG)

WebNLG

Val

0.6642

0.4727

0.7558

WebNLG

Test

0.6529

0.4681

0.7535

FARO test

0.0

0.056

0.1281

Combined

WebNLG

Val

0.6368

0.4543

0.7468

WebNLG

Test

0.6101

0.4409

0.7260

FARO test

0.0651

0.0983

0.2304

  • The model trained on the WebNLG dataset exhibits poor performance on tasks requiring semantic precision due to its lack of exposure to such relations.
  • Incorporating the FARO dataset, which contains semantically precise relations, has led to some improvement in the model's performance.
  • Nevertheless, further qualitative analysis is necessary to determine whether the output is indeed more semantically enriched.

19 of 26

V. Qualitative Analysis (QLA)

19

V. (QLA)

20 of 26

Qualitative analysis on ASRAEL KG

V. (QLA)

20

The text generated by the combined dataset: the trained model appears more semantically robust.

Triples

Label

Base

Combined

(Demand, cause, benefited)

The company benefited from continued strong demand and higher selling prices for titanium dioxide, a white pigment used in paints, paper and plastics.

benefited “ is the cause of the demand

The company said it benefited from the strong demand for its products and services from a growing number of customers.

21 of 26

V. (QLA)

Qualitative analysis on ASRAEL KG

21

Does the (combined) model trained on a dataset containing "semantically precise relationships" show improved fluency and adequacy over the one that did not (base)?

3 annotators performed a manual review:

  • Fluency: Does a sentence read or sound more smooth, clear, and natural, adhering to proper grammar and syntax compared to the other.
  • Adequacy: Does a sentence incorporate more triples, in a correct manner, compared to the other

Combined model generates a better adequate sentence than base model → win adequacy

Base model generates a better fluent sentence than the combined model → lose fluency

Both models generate an equally good/worse adequate sentence → tie adequacy

22 of 26

V. (QLA)

Qualitative analysis on ASRAEL KG

22

Subject

Predicate

Object

2021 Fukushima earthquake

location

Japan

2021 Fukushima earthquake

date

2021-02-13

earthquake

cause

collapsing

earthquake

cause

injured

broken

cause

damage

Base:

2021 Fukushima earthquake , which was caused by collapsing , is located in Japan and was broken .

Combined:

The 2021 Fukushima earthquake , which hit Japan on February 13th , 2021 , injured many people and caused extensive damage and collapsing .

23 of 26

V. (QLA)

Qualitative analysis on ASRAEL KG

23

The combined model is significantly better at generating more sophisticated sentences compared to the

base model.

win/lose/tie was determined by taking a majority vote on the annotation

Task

Fluency

Adequacy

Win %

Lose %

Tie %

Win %

Lose %

Tie %

7 selected events

71.4

14.3

14.3

28.6

0.0

71.4

24 of 26

Quantitative analysis on ASRAEL KG -Manually Annotated Article

V. (QLA)

24

  • Manually annotate the relations extracted from the article entitled: “Russia launches Iranian satellite amid Ukraine war concerns”
    • Ensure that the events and relations extracted from REBEL are accurate, and that there is no error propagation stemming from the event extraction process.

Task

Fluency

Adequacy

Win %

Lose %

Tie %

Win %

Lose %

Tie %

Manually annotated article

33.3

16.7

50.0

58.3

8.3

33.3

25 of 26

Conclusion

25

  • Investigated building complex narratives in graph form, generating text with high complexity and semantic richness.
    • Enhanced the WebNLG dataset by incorporating the FARO dataset to refine event relation semantics.
    • Expanded dataset includes relations like causality, prevention, intention, and enabling.
  • No statistically significant difference observed in fluency metrics.
  • Qualitative analysis indicates training on precise event relations produces more complete sentences.
  • Experiment with more data for conclusive results.
    • Augmenting the dataset through NLP techniques could improve quality and comprehensiveness.

26 of 26

26

Thank you!

GitHub

This presentation

kFLOW Website

Narrative Graph