1 of 37

Albert Meroño-Peñuela

Interfacing Human-Machine Intelligence with Cultural AI

2 of 37

Albert Meroño Peñuela, PhD

  • Research fellow KR&R, VU Amsterdam
  • CLARIAH
    • Structured Data Engineering Lead
    • Technical Board Chair IG Linked Open Data
  • Golden Agents Steering Group
  • ODISSEI VU WP Data broker
  • DARIAH Chair AI and Music WG

2

3 of 37

Machine intelligence is not enough

3

4 of 37

How can we combine Human and Machine Intelligence?

By “endowing machines with a deeper understanding of the world”

4

“Some significant fraction of the knowledge that a robust system is likely to draw on�is external, cultural knowledge that is symbolically represented” [Marcus arXiv 2020]

CULTURE

5 of 37

Cultural AI is crucial for Hybrid Intelligence (CARE)

Real-world domains: Culture embodies challenging human activities

AI as a co-author as paramount example

5

  • Collaborative HI
  • Adaptive HI
  • Responsible HI
  • Explainable HI
  • Multimodal CAI
  • Subjective CAI
  • Interfaceable CAI
  • Transparent CAI

HYBRID INTELLIGENCE

6 of 37

My research on Interfacing Human-Machine Intelligence with Cultural AI

  1. What are adequate knowledge representations for multimodal cultural data?
  2. How can we combine multimodal datasets reliably and at scale?
  3. What data structures, algorithms and models do we need to encode social behavior?

6

7 of 37

Research Highlights

7

Knowledge Graph�Construction

Text Generation

Interfaces

Social Querying

Interactive Learning�& Reasoning

Knowledge Graph�Completion

8 of 37

Research Highlights

8

Knowledge Graph�Construction

Text Generation

Interfaces

Social Querying

Interactive Learning�& Reasoning

Knowledge Graph�Completion

9 of 37

KGC from structured data

Scalable and intelligent KG construction algorithms for

  • Spreadsheets with meaningful layouts
  • CSV tables
  • MIDI files

9

CEDAR KG (660M triples)

[Meroño et al. SWJ 2015]

MIDI KG (10B triples)

[Meroño et al. ISWC 2017]

CLARIAH KG (865M triples)

[Hoekstra, Meroño et al. JWS 2018]

10 of 37

Ethics by design: privacy reasoning

  • Legal knowledge from judicial practice
  • Focus on the concepts of
    • File (legal)
    • Warnings: situations that threaten compliance
    • Derived concepts needed for reasoning
    • Exceptions

10

[Casellas, Nieto, Meroño et al. AAAI 2010]

COLLABORATIVE HI

RESPONSIBLE HI

11 of 37

Efficient semantic list management

[Meroño et al. ISWC 2019] [Daga, Meroño et al. QuWeDa 2019]

What list models have socially emerged from the Web?

11

COLLABORATIVE HI

12 of 37

Efficient semantic list management

What is the impact of these models in query performance?

12

ADAPTIVE HI

13 of 37

Research Highlights

13

Knowledge Graph�Construction

Text Generation

Interfaces

Social Querying

Interactive Learning�& Reasoning

Knowledge Graph�Completion

14 of 37

Deterministic multimodal KG completion

[Meroño et al. SAAM 2018]

Leveraging melody and text for entity linking:

  • owl:sameAs links through geometric melodic similarity [Urbano et al. CMMR 2011]
  • Semantic entity linking [Mendes et al. i-SEM 2011] (Aho-Corasick + 272 DBpedia ontology classes)

14

15 of 37

Graph embeddings link music to high-level features

[Lisena, Meroño et al. TISMIR 2020]

15

“adagio”

Symbolic music distributional hypothesis:

Can we relate groups of notes in similar contexts with similar meanings?

ADAPTIVE HI

16 of 37

MIDI2vec scales to Web size

[Lisena, Meroño et al. TISMIR 2020]

SLAC dataset (250 MIDI files, high level features)

MuseData (438 MIDI files)

EchoNest: scaling up to Web size

16

ADAPTIVE HI

17 of 37

Research Highlights

17

Knowledge Graph�Construction

Text Generation

Interfaces

Social Querying

Interactive Learning�& Reasoning

Knowledge Graph�Completion

18 of 37

Web-scale music mashups

[Meerwaldt, Meroño et al. WHiSe 2017]

  • Mashup editing is largely a manual and repetitive process
  • Contribution: systematic & scalable music mixing with KGs
  • SPARQL access normalization: rhythm, harmony, metadata, linked entities

18

COLLABORATIVE HI

Song 1

Song 2

Mashup

19 of 37

Music can be learned by example...

19

Music as a language model

[Wilschut, Wijtsma, Meroño et al. 2020] (in progress)

ADAPTIVE HI

20 of 37

… but it is best learned socially & interactively

20

Reasoning

Learning

Mutation

[Miras, Meroño et al. 2020] (in progress)

ADAPTIVE HI

Evolutionary algorithm

Social feedback

COLLABORATIVE HI

21 of 37

… but it is best learned socially & interactively

21

Mutation

22 of 37

Research Highlights

22

Knowledge Graph�Construction

Text Generation

Interfaces

Social Querying

Interactive Learning�& Reasoning

Knowledge Graph�Completion

23 of 37

Human-machine querying

[Meroño et al. ESWC 2016, ISWC 2017] [Lisena, Meroño et al. ISWC 2019]

  • Querying gives humans and machines access to KGs
  • But humans do much more than just typing SPARQL

23

body

query info

access

parameters

FAIRquery

COLLABORATIVE HI

24 of 37

Human-machine querying

  • grlc.io: Automatic creation of Linked Data APIs, leveraging public fairqueries
  • Used by 3,200 users (Elsevier, TNO)

24

25 of 37

Explanations from provenance graphs

Social, selective explanations

25

[Hoover et al. arXiv 2019]

EXPLAINABLE HI

COLLABORATIVE HI

[Groth & Moreau 2013]

[Hoekstra & Groth PAW 2014]

26 of 37

Research Highlights

26

Knowledge Graph�Construction

Text Generation

Interfaces

Social Querying

Interactive Learning�& Reasoning

Knowledge Graph�Completion

27 of 37

AI as a co-author

  • Need for AI in scientific discovery [Gil DSJ 2017]; humans are limited:
    • Not systematic
    • Errors
    • Biases
    • Poor reporting
  • New language models (e.g. GPT-2) generate remarkably coherent paragraphs
  • Retraining of GPT-2 with Semantic Web Conference proceedings (21M tokens)

27

[Radford et al. OpenAI 2019]

28 of 37

Generating explanations

28

An explanation for the absence of effective and general methods for ontology matching is that we are unable to match domain ontologies on all of the instances, and however, we have already performed several tests on ontologies with several hundred classes and over a million instances and found that they do not match most features of the domain ontology.

A simple explanation for the difficulty of solving large scale ontology reasoning is that we tend to solve small problems by imposing very big and complex rules. We often end up with very large portions of ontologies that cannot be represented using standard reasoners.

[Meroño et al. ESWC 2020]

(under review)

EXPLAINABLE HI

Explanation

Explanation

Machine learning techniques are used for the task of entity linking because it is a challenging task for the user. Therefore, we propose a novel method that is scalable to large knowledge bases with a high number of facts and a high accuracy.

Explanation

29 of 37

Generating hypotheses and definitions

29

[Meroño et al. ESWC 2020]

(under review)

EXPLAINABLE HI

In this paper, our hypothesis is that a new ontology can be derived from an existing one. The key idea is to combine ontology-based and knowledge-based approaches.

Hypothesis

The Semantic Web is a rich and evolving web of interconnected resources with many different types of content, such as web pages, blogs, videos, music plays, and so on.

Definition

30 of 37

Hybrid papers: Human drives, machine writes

  • Similar experiment with 30K COVID-19 papers
  • Structured, knowledge-based templates on top of language models

30

ADAPTIVE HI

[Meroño et al. ESWC 2020]

(under review)

Inspired by

[van Harmelen & ten Teije JWE 2019]

31 of 37

31

ADAPTIVE HI

RESPONSIBLE HI

COLLABORATIVE HI

EXPLAINABLE HI

Human? Crowds? Human in the loop?

32 of 37

Conclusions

  • Cultural knowledge is a key interface for human-machine intelligence
  • Cultural AI brings in a challenging agenda that matches HI
    • Multimodality
    • Subjectivity and multiple perspectives
    • Language, cultural interfaces (diversity)
    • Transparency and bias
  • The future: symbolic structures enabling collaboration
    • Purpose and intent representations
    • Structured models of collaboration

32

COLLABORATIVE HI

ADAPTIVE HI

33 of 37

Thank you

Mentors

Students (esp. Rick Meerwaldt, Nina Wilschut, Stefan Wijtsma)

33

34 of 37

Cited Contributions (1/2)

[Casellas, Nieto, Meroño et al. AAAI 2010] Casellas, N., Nieto, J-E., Meroño, A., Roig, A., Torralba, S., Reyes de los Mozos, M., Casanovas, P. “Ontological Semantics for Data Privacy Compliance: the NEURONA Ontology”, AAAI Spring Symposium Series Technical Reports (Intelligent Information Privacy Management), Stanford 23rd-25th of March 2010.

[Daga, Meroño et al. QuWeDa 2019] Enrico Daga, Albert Meroño-Peñuela, Enrico Motta. “Modelling and Querying Lists in RDF. A Pragmatic Study”. In: 3rd Workshop on Querying and Benchmarking the Web of Data (QuWeDa 2019), ISWC 2019, 18th International Semantic Web Conference (2019).

[Hoekstra, Meroño et al. JWS 2018] Rinke Hoekstra, Albert Meroño-Peñuela, Auke Rijpma, Richard Zijdeman, Ashkan Ashkpour, Kathrin Dentler, Ivo Zandhuis, Laurens Rietveld. “The dataLegend Ecosystem for Historical Statistics”. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, volume 50, pp. 49-61 (2018).

[Lisena, Meroño et al. ISWC 2019] Pasquale Lisena, Albert Meroño-Peñuela, Tobias Kuhn, Raphaël Troncy. “Easy Web API Development with SPARQL Transformer”. In: The Semantic Web – ISWC 2019, 18th International Semantic Web Conference. Lecture Notes in Computer Science, vol 11779, pp. 454-470 (2019).

[Lisena, Meroño et al. TISMIR 2020] Pasquale Lisena, Albert Meroño-Peñuela, Raphaël Troncy. MIDI2vec: Learning MIDI Embeddings for Reliable Prediction of Symbolic Music Metadata. Transactions of the International Society for Music Information Retrieval (TISMIR) (2020)

[Meerwaldt, Meroño et al. WHiSe 2017] Rick Meerwaldt, Albert Meroño-Peñuela, Stefan Schlobach. “Mixing Music as Linked Data: SPARQL-based MIDI Mashups”. In: Proceedings of the 2nd Workshop on Humanities in the SEmantic web (WHiSe 2017). ISWC 2017, October 22nd, Vienna, Austria (2017).

[Meroño et al. SWJ 2015] Albert Meroño-Peñuela, Ashkan Ashkpour, Christophe Guéret, Stefan Schlobach. “CEDAR: The Dutch Historical Censuses as Linked Open Data”. Semantic Web — Interoperability, Usability, Applicability, 8(2), pp. 297–310. IOS Press (2015).

[Meroño et al. ISWC 2017] Albert Meroño-Peñuela, Rinke Hoekstra, Aldo Gangemi, Peter Bloem, Reinier de Valk, Bas Stringer, Berit Janssen, Victor de Boer, Alo Allik, Stefan Schlobach, Kevin Page. “The MIDI Linked Data Cloud”. In: The Semantic Web – ISWC 2017, 16th International Semantic Web Conference. Lecture Notes in Computer Science, vol 10587, pp. 156-164 (2017).

34

35 of 37

Cited Contributions (2/2)

[Meroño et al. ISWC 2019] Albert Meroño-Peñuela, Enrico Daga. “List.MID: A MIDI-Based Benchmark for Evaluating RDF Lists”. In: The Semantic Web – ISWC 2019, 18th International Semantic Web Conference. Lecutre Notes in Computer Science, vol 11779, pp. 246-260 (2019).

[Meroño et al. SAAM 2018] Albert Meroño-Peñuela, Reinier de Valk, Enrico Daga, Marilena Daquino, Anna Kent-Muller. “The Semantic Web MIDI Tape: An Interface for Interlinking MIDI and Context Metadata”. In: Workshop on Semantic Applications for Audio and Music, ISWC 2018. 9th October 2018, Monterey, California, USA (2018).

[Meroño et al. ESWC 2016] Albert Meroño-Peñuela, Rinke Hoekstra. “grlc Makes GitHub Taste Like Linked Data APIs”. The Semantic Web – ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 – June 2, 2016, Revised Selected Papers. LNCS 9989, pp. 342-353 (2016)

[Meroño et al. ISWC 2017] Albert Meroño-Peñuela, Rinke Hoekstra. “Automatic Query-centric API for Routine Access to Linked Data”. In: The Semantic Web – ISWC 2017, 16th International Semantic Web Conference. Lecture Notes in Computer Science, vol 10587, pp. 334-339 (2017)

[Meroño et al. ESWC 2020] Albert Meroño-Peñuela, Dayana Spagnuelo, GPT-2. “Is a Transformer Your Next Semantic Co-Author? Generating Semantic Web Paper Snippets with GPT-2”. In: The Semantic Web – ESWC 2020 Satellite Events, posters & demos (2020) (under review)

35

36 of 37

References

[Gil DSJ 2017] Gil, Yolanda. "Thoughtful artificial intelligence: Forging a new partnership for data science and scientific discovery." Data Science 1, no. 1-2 (2017): 119-129.

[Groth & Moreau 2013] Paul Groth, Luc Moreau. “PROV-Overview: An Overview of the PROV Family of Documents”. W3C Working Group Note 30 April 2013 https://www.w3.org/TR/prov-overview/

[van Harmelen & ten Teije JWE 2019] A Boxology of Design Patterns for Hybrid Learning and Reasoning Systems. van Harmelen, F.; and ten Teije, A. Journal of Web Engineering, 18(1-3): 97-124. 2019.

[Hoekstra & Groth PAW 2014] Hoekstra, Rinke, and Paul Groth. "PROV-O-Viz-understanding the role of activities in provenance." In International Provenance and Annotation Workshop, pp. 215-220. Springer, Cham, 2014.

[Hoover et al. arXiv 2019] Benjamin Hoover, Hendrik Strobelt, Sebastian Gehrmann. “exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models”. Computation and Language (cs.CL); Machine Learning (cs.LG). arXiv:1910.05276 [cs.CL] (2019)

[Marcus arXiv 2020] Gary Marcus. “The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence”. Artificial Intelligence (cs.AI); Machine Learning (cs.LG). arXiv:2002.06177 [cs.AI] (2020)

[Marcus & Davis 2019] Marcus, Gary, and Ernest Davis. Rebooting AI: building artificial intelligence we can trust. Pantheon, 2019.

[Radford et al. OpenAI 2019] Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. "Language models are unsupervised multitask learners." OpenAI Blog 1, no. 8 (2019): 9.

36

37 of 37

Abstract. Many aspects of human intelligence, such as language, legal reasoning, and music understanding are informed by rich models of our surrounding cultural world. However, current AI systems are mainly concerned about achieving high performance in narrowly designed tasks, and typically ignore these cultural models of the world. Cultural AI is the understanding of human culture and ethical values by machines, but also the empowerment of humans to address, understand and advance culture by using AI. How can we make AI systems aware of human culture and values? How does cultural knowledge impact practices in knowledge engineering, reasoning, databases and the Web? Will we be content with robots that can prepare our meals, clean the streets, and take care of our elderly? Or, beyond these, will we rather seek inspiring conversations on the fairness of Galileo’s trial, the spectrum of musical emotions, or the authenticity of historical documents? In this talk, I will share the results of my research on Cultural AI through systems that combine knowledge graph construction, knowledge graph completion, and a number of diverse interfaces using interactive learning and reasoning, social querying, and text generation. I will discuss how these can contribute to the Hybrid Intelligence program as interfaces for human-machine intelligence, and as enablers of a more Collaborative, Adaptive, Responsible and Explainable AI.

37