1 of 50

2 of 50

Newspaper day 20/03/2023

3 of 50

Program

13u30-13h45: Introductions

13h45-13h55: Brecht Deseure, presentation of KBR newspaper department and CAMille project

13h55-14h05: Julie Birkholz, presentation of Digital Research Lab

14h05-14h25: Niklas Stenzel (ZOG)

14h25-14h45: Elias Degruyter (UGent)

14h45-15h05: coffee break

15u05-15u25: Vincent Ducatteeuw (UGent)

15h25-15u45: Leon Castelein (UGent)

15u45-16u05: Tess Dejaeghere (UGent)

16h05-16u25: Concluding discussion

3

WiFi network: KBR_Readers

Access code: KBR_2019

WiFi network: KBR_Events

Access code: Ghs442uF

4 of 50

Feedback

Green post-its

What can KBR do to improve access to the newspapers collections for researchers?

Yellow pos-its

Ideas for future newspapers days.

Establish a network/mailing list?

Yearly event with open cfp?

Focus on newspapers or also include periodicals?

4

5 of 50

KBR Newspapers department

Largest and most diverse historical newspapers collection in Belgium

From the 17th century until today

Legal deposit: 1 physical issue of each Belgian newspaper deposited daily. Digital legal deposit underway

Ca. 2000 Belgian titles of the ‘grande presse’

Ca. 1000 Belgian titles of the ‘petite presse’

Ca. 500 foreign newspapers from 60 countries

Colonial newspapers

5

6 of 50

Special collections

Fonds Gaston Mertens: over 60.000 specimens from ca. 1.650 Belgische localities (17th century - 1948)

Fonds Philippe Vandermaelen: 2.405 specimens of foreign newspapers of the 19th century (Europa and USA)

Fonds Jacques Delmelle: complete journalistic oeuvre of one Belgian journalist, comprising over 10.000 articles/publications

6

7 of 50

Access

One of KBR’s most consulted and vulnerable collections

Different supports: paper, microfilm (600 titles), digital

127 digitized and OCR-ed titles (1814-1950) = ca. 4 million pages

1950-1989 (= ca. 3,5 million pages) will be digitized within the following three years

Until 1918: freely accessible, also remotely

After 1918: only accessible on KBR computers (or remotely with special permission)

Simple search functions

7

8 of 50

8

9 of 50

9

10 of 50

10

11 of 50

CAMille FED-tWIN (KBR-ULB)

Goals

  • Writing the history of Belgian journalism
  • Digitizing Belgian newspapers and journalistic archives

Components

  • CAMille platform of digitized newspapers and archives (data level access)
  • Database of Belgian journalists
  • Research projects

11

12 of 50

The CAMille platform

12

13 of 50

Database of Belgian journalists

  • Prosopography of Belgian journalists (1830-today)
  • Paper dictionary inherited from Pierre Van den Dungen, transformed into spreadsheet
  • Enriched via data extracted from library catalogues
  • Development of relational database underway

13

14 of 50

Research projects

  • PhD project Alexia Vidalenche
  • Analysing discourse about journalists and journalism. Current focus: women journalists
  • Reconstructing journalistic careers, e.g. Alice Bron (with Alexia Vidalenche, Florence Le Cam, Manon Libert and Sébastien de Valériola)
  • Reconstituting editorial teams
  • Press releases of the Belga press agency
  • Developing DH solutions for newspapers corpora: signature recognition, OCR improvement, topic modelling (with Isabelle Gribomont)

14

15 of 50

15

16 of 50

16

 

Digital Research Lab

Julie M. Birkholz, Lead of KBR’s Digital Research Lab & Assistant Professor Digital Humanities,

Ghent University

17 of 50

Goals

  1. support and facilitate data level access to KBR’s diverse, multilingual digitised and born-digital collections;�
  2. stimulate the (re)use and research of these digital sources, data and metadata of these collections.

17

18 of 50

in practice

  • Supporting research on datakbrbe, BESOCIAL & BelgicaWeb;
  • SUpporting interns and students doing DH research on the collections;
  • Coordinating the future FWO Junior Project on DH in Navez’s correspondence & historical networks; & ESFRI Virtual Lab- development of a data on demand service

18

19 of 50

DATA-KBR-BE

Facilitating data-level access to KBR’s digitised and born-digital collections for digital humanities research

20 of 50

Digitised Historical Newspapers as Data

21 of 50

Interdisciplinary Research Scenarios

22 of 50

From Collections to Corpora

Which newspapers are digitised?

Importance of historical context

23 of 50

Zentrum für Ostbelgische Geschichte

24 of 50

German Language Newspapers in Belgium

25 of 50

Discursive Identity Construction in East Belgium

A Linguistic and Discourse Historical Analysis of the Patterns of Language and Communication in East Belgian Mass Media

25

26 of 50

Agenda

  • Research interest
  • Link to digital newspapers
  • Methodology
  • Discussion

26

27 of 50

Research Interest

  1. How are East Belgian identities constructed in public media discourses?
    • Which contents are objects of discourse?
    • Which communicative strategies are used to establish a certain identity?
    • How are these strategies realised in terms of language use?
  2. How is the situation of East Belgium as a border region referred to? Does the relation towards Germany or the rest of Belgium play a role?
  3. Does the construction of identities itself become subject of discourse?

27

28 of 50

Link to digital newspapers

  • A lot of our knowledge about the society we live in is influenced by mass media (e.g. newspapers)
  • Digitisation makes them usable for corpus linguistics

28

29 of 50

Methodology

  • Discourse historical approach (DHA)
    • Synchronic and diachronic linguistic discourse analysis that considers historical background information and sources
    • Three levels of analysis
      • Contents
      • Strategies
      • Forms of realisation

  • Computer aided corpus linguistics
    • Combination of qualitative and quantitative methods
    • Search engines, KWIC, n-gram-analysis, cluster-analysis, collocation-/cooccurrence-analysis

Niklas Stenzel

Zentrum für Ostbelgische Geschichte

Kaperberg 2-4

B-4700 Eupen

www.geschichte.be

30 of 50

Methodology

  • Corpus construction
    • Identifying events that could probably be associated with collective identity
    • Search for articles within a certain time period around the event that refer to it
    • Step by step extension of corpora by identifying keywords through qualitative analysis
    • Converting to txt and tagging for better workability with corpus linguistic tools

30

31 of 50

Methodology

AntConc

Freeware corpus analysis toolkit for concordancing and text analysis

Niklas Stenzel

Zentrum für Ostbelgische Geschichte

Kaperberg 2-4

B-4700 Eupen

www.geschichte.be

32 of 50

Discussion

Niklas Stenzel

Zentrum für Ostbelgische Geschichte

Kaperberg 2-4

B-4700 Eupen

www.geschichte.be

33 of 50

Presentation Elias Degruyter

33

34 of 50

Presentation Vincent Ducatteeuw

34

35 of 50

Presentation Leon Castelein

35

36 of 50

Ownership of public space

  • Research of collective action in Ghent
    • Gita Deneckere: “Sire het volk mort”
    • Theory by Charles Tilly: changing of repertoires�
  • Criticisms
    • Relevancy of locality and territoriality
    • Martin Schoups: “Meesterschap over de straat”
    • Hypotheses

36

37 of 50

Data collection & newspapers

37

38 of 50

38

39 of 50

Spatial analysis

Katrina Navickas, Protest and the politics of space and place, 1789-1848, 180

39

40 of 50

40

41 of 50

Beyond Babylonian Confusion �a case-study based approach for multilingual NLP on historical literature

Author: Tess Dejaeghere

Supervisors: Julie Birkholz, Els Lefever, Christophe Verbruggen

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101004984

42 of 50

42

43 of 50

Personalia

Background

  • Applied Linguistics -Translation/Interpreting – Digital Humanities & Digital Text Analysis
  • PhD student @ Ghent Center for Digital Humanities:
    • Deliverables CLSINFRA Working Package 8
      • ML Named Entity Recognition (NER) pipeline
      • ML & lexicon-based Sentiment Analysis (SA) pipeline
      • Relation Extraction (REX) pipeline
    • Personal PhD research
      • Title: “Beyond Babylonian Confusion: a case-study based approach for multilingual NLP on historical literature”

44 of 50

Ongoing Research

Natural Language Processing vs. (Digital) Humanities

(Digital) Humanities: analyze text(s) according to a hermeneutic model to make inferences about past and present.

    • Stylometry
    • Authorship attribution

Natural Language Processing: subfield of artificial intelligence and computational linguistics aimed at making natural language understandable to computers.

    • Machine translation
    • Speech recognition
    • Chatbots

45 of 50

Ongoing Research

Natural Language Processing vs. (Digital) Humanities

(Digital) Humanities: analyze text(s) according to a hermeneutic model to make inferences about past and present.

    • Stylometry
    • Authorship attribution

Natural Language Processing: subfield of artificial intelligence and computational linguistics aimed at making natural language understandable to computers.

    • Machine translation
    • Speech recognition
    • Chatbots

46 of 50

Ongoing Research

Natural Language Processing vs. (Digital) Humanities

Named Entity Recognition (NER)

    • Automatically extract and classify entities from a text.
    • Commercial applications in e.g. text summarization and question answering systems.

Example: “Julie works in Ghent.”

Sentiment Analysis (SA)

    • Automatically extract sentiment from a text.
    • Commercial applications in e.g. opinion mining and social media mining.

Example: “Julie was happy with her cup of tea!” 🡪 POSITIVE

PERSON

LOCATION

47 of 50

Ongoing Research

The gaps

! Need for transparent, reproducible and durable NLP workflows which are tailored to heuristic research.

! Need for an overview of the possibilities and limitations of NLP-tools in literary-historical research settings.

! Need for practical insights regarding NLP application to data with literary-historical characteristics.

! Need for workflow communication standards.

🡪 Facilitate exchange of practices between NLP and DH.

48 of 50

Ongoing Research

Approach: case-study based

  • Rooted in practice and flexible

    • OCR output evaluation
    • NLP tool evaluation
    • Tool application
    • Visualization
    • Interpretation
    • Dissemination

! Suggestions regarding tool selection, tool output interpretation, workflow dissemination

49 of 50

THANK �YOU

Tess.Dejaeghere@ugent.be

50 of 50

Feedback

Yellow post-its

What can KBR do to improve access to the newspapers collections for researchers?

Blue post-its

Ideas for future newspapers days.

Establish a network/mailing list?

Yearly event with open cfp?

Focus on newspapers or also include periodicals?

50