1 of 10

Overview of digital infrastructures for SSH research in Finland

Inés Matres, University of Helsinki

DARIAH-FI www.dariah.fi / @dariahfi.bsky.social

Vaasa Roadshow, 14.3.2025

2 of 10

What are Research Infrastructures? RIs are support systems

(data, tools, training or consultancy) to carry out scholarly practices

FORMING A DATASET

Selecting, subsetting, live-data storage

DATA GATHERING

Data sources, access points, APIs, …

WORKING WITH DATA

Data preparation, transcription, coding, analysis, visualizations

SHARING AND ARCHIVE

Publication, access rules, preservation

SYNTHESIS & REPORTING

Presenting findings, illustrating

Data

Interactions

3 of 10

CLARIN + DARIAH ➠ FIN-CLARIAH

Since 2022: FIRI National Research Infrastructure roadmap

  • The language bank of Finland (CLARIN): Common Language RI
  • DARIAH - FI (broader disciplinary base): RI for data-intensive humanities and social sciences
  • 7 Universities + National Library of Finland + CSC

Since 2025 Lighthouse status

  • Funding 2022-2029 Research Council of Finland (~2Mio yearly)

DARIAH-FI

FIN-CLARIN

Support services

Kielipankki

FIN-CLARIAH

CSC, NLF, FINNA, other national RIs

4 of 10

Principles of infrastructure development

Scaling infrastructure for SSH research

We empower researchers to scale operations beyond their local horizons, developing tools that enable big-data processing, and promoting collaboration.

Knowledge network

Find interdisciplinary expertise on the use of historical and contemporary digital data, computational methods and the study of digital culture.

Spreading digital competence

We organize training and provide information about education to promote digital scholarship in every Finnish university.

5 of 10

Why do we need digital and data intensive infrastructures

Case 1: Social media analysis at scale

“To simplify, if I start to process billions of words on my laptop, my computer slows down substantially”

#Suomi

#NATO

6 of 10

Why do we need digital and data intensive infrastructures

Case 2: Digital cultural heritage

“Digitized collections (meta)data is heterogeneous, making it difficult to find connections across GLAM domains”

7 of 10

Why do we need digital and data intensive infrastructures

Case 3: Multimodal, complex research data

“In game studies, stream interaction between video, audio and chats generate data that need to be analysed together”

Livestream: Rekkles playing LoL https://www.twitch.tv/rekkles

8 of 10

Meet the DARIAH-FI Network

Inés Matres

Prof. Eero Hyvönen

Prof. Veronika Laippala

Paula Rationaho

Venla Posso

Prof. Sanna Kumpulainen

Marika Rauhala

Prof. Mikko Tolonen (UHEL)

Director

Prof. Eetu Mäkelä (UHEL)

Technical Lead

Päivi Pihlaja

Tanja Välisalo

Katri Tegel

National coordinator

DARIAH - FI is a national research infrastructure created for the needs of data-intensive social sciences and humanities (SSH) in Finland.

9 of 10

DARIAH-FI resources

CULTURAL HERITAGE DATASETS & ANALYSIS TOOLS

(FENNICA, NEWSPAPERS, ARCHIVE ANNOTATION, TEXT-REUSE, SAMPOS, FINNA DATA, IN-COPYRIGHT DATA)

SOCIAL SCIENCE DATA & TOOLS (PARLIAMENT SPEECHES, MASS SURVEYS)

SOCIAL MEDIA DATA & ANALYSIS TOOLS (NORDIC TWEET STREAM, CITIZEN FORUMS)

NOISY WEB DATA (SUBSETTING DATA, MULTILINGUAL REGISTER, TOXICITY, INTERACTION, MULTIMODALITY)

STREAMED DATA ANALYSIS (TWITCH, YOUTUBE)

dariah.fi/resources (datasets, tools, video tutorials, documentation)

AVAILABLE

UPCOMING

10 of 10

COUNT DOWN TO

DIGITAL HUMANITIES HACKATHON HELSINKI

    • until 12.4. Application period
    • 14.–23.5. Hackathon in Helsinki

Themes: Historical newspapers, parliamentary debate, online discussion on Earth resources, New! Oral history interviews

Open to MA students doctoral researchers in humanities, computer & data science 

Limited EU bursaries available

https://heldig.fi/dhh25