1 of 24

Sensitive qualitative data and pseudonymisation

Arin Tham Savran

OS in the Swedish Context

Göteborg, 2026-04-06

^{| Göteborgs universitet - Chalmers tekniska högskola - Karolinska Institutet - Kungliga Tekniska högskolan - Lunds universitet -} ^{Stockholms universitet - Sveriges lantbruksuniversitet - Umeå universitet - Uppsala universitet}

Detta verk är licensierat under en Creative Commons Erkännande 4.0 Internationell Licens.

2 of 24

Session plan

15:00 – 15:10 Welcome

15:10 – 15:40 Presentation

15:45 – 15:50 Activity 1

15:50 – 16:00 Presentation continues

16:00 – 16:10 Activity 2

16:10 – 16:20 Class discussion

16:20 – 16:30 Presentation finishes

Photo Matthew Henry on Burst

3 of 24

Types of qualitative ”sensitive data collection”

In-depth interviews
Focus groups
Observations
Field notes
Diaries/journals
Audio, video, photo elicitation
Online forum or social media analysis
Open-ended questionnaries or surveys
Document analysis

Svensk nationell datatjänst

3

4 of 24

Sensitive data

Personal information: ethnicity, political opinion, union membership, religious and philosophical beliefs, health & sexual orientation, genetic info, biometrical info

But also other details ex. criminal history & sanctions record, and financial info

Geographical location of humans, animals, a physical site/entity

And probably some more…

Svensk nationell datatjänst

4

5 of 24

GDPR (Dataskyddsförordningen)

Appropriate measures must be taken to protect sensitive information…

Pseudonymising or anonymising are important tools

All research data must be stored securely (during and after project finishes) using technical and organisational measures

University Data Office, IT, and Archive services usually help with this

Remember to ask for help if you don’t know how!

Svensk nationell datatjänst

5

6 of 24

Pseudonymisation & Anonymisation: �In a nutshell

Create codes or aliases

Use a code key or log

Subject to GDPR as long as the code key exists

If code key is deleted, data is anonymised and GDPR not applicable. But…

Svensk nationell datatjänst

6

7 of 24

Here’s the thing though…

Swedish Archive's Act

Svensk nationell datatjänst

7

8 of 24

RE-IDENTIFICATION �Indirect identifiers and background variables

Gender
Age
Income
Occupation
Municipality of residence
And so forth…

Svensk nationell datatjänst

8

9 of 24

1. Remove the ”obvious” and generalise identifying details

Removing or masking: �”Hello, my name is Anders Andersson!” → ”Hello, my name is Anders Andersson!”
Generalising: �”I live in Kalix” →. ”I live in a small town in northern Sweden”. �Though perhaps masking more suitable?

Make sure there’s nothing that could identify individuals, study participant or otherwise! Including via re-identification…

Exceptions? Yes, in some cases people want to be identified. But up to PI to ultimately judge the appropriateness!

Svensk nationell datatjänst

9

10 of 24

2. Create code/alias key

Use fake names, participant ID’s, or role-based labels.

Also, keep the key separately from the research data, stored securely, and accessible only to authorised individuals.

Ex:

Participant ID	Real name	Contact details
P001	Jane Doe	Email/phone

Svensk nationell datatjänst

10

11 of 24

3. Pseudonyms for roles, organisations, geographical regions

[Teacher 1]
[Teacher 2]
[School 1]
[School 2]
[Region 3]
[University]
[major newspaper]
[municipality]

Svensk nationell datatjänst

11

12 of 24

4. Pseudonymise places, organisations/companies, and networks

Original:

“I worked at Volvo Torslanda under my foreman Anders at T3.”

Replace with something like:

“I worked for a large manufacturing company in a major Swedish city under a senior manager.”

Svensk nationell datatjänst

12

13 of 24

5. Modify dates

Dates can be identifying, especially around incidents, complaints, hospitalisation, legal proceedings, or media-covered events.

Source: ChatGPT Edu

Original	Safer version
“on 17 September 2023”	“in autumn 2023” or ”2023”
“three days after the public inquiry”	“shortly after a major public event”
“during the 2020 election campaign”	“during a national campaign period”

Svensk nationell datatjänst

13

14 of 24

6. Generalise and categorise background information

Examples:

converting age to age brackets
recoding municipality into county
classifying income as low, medium, or high
recoding political affiliation as right or left
grouping occupation or workplace into public or private sector

Source: www.researchdata.se

Svensk nationell datatjänst

14

15 of 24

ACTIVITY 1

3-5 min

16 of 24

ACTIVITY 1: Pseudonymised examples from ChatGPT

Original	Pseudonymised
“I live in Kiruna”
“I am a 47-year-old neurosurgeon”
“My son attends Greenfield Primary”
“I started on 12 March 2024”
“My research is on hygiene practices in post-WW2 Finland and Sweden”
“As the only Arabic-speaking pediatric nurse in town…”

Svensk nationell datatjänst

16

17 of 24

Pseudonymised examples from ChatGPT (cont.)

Original	Pseudonymised
“I live in Kiruna”	“I live in a small town in northern Sweden”
“I am a 47-year-old neurosurgeon”	“I am a mid-career specialist doctor”
“My son attends Tullbro School”	“My child attends a local primary school”
“I started on 12 March 2024”	“I started in early 2024”
“My research is on hygiene practices in post-WW2 Finland and Sweden”	“My research is on public health practices in two Nordic countries during the mid-20th century”
“As the only Arabic-speaking pediatric nurse in town…”	“As one of few multilingual pediatric healthcare workers in the area…”

Svensk nationell datatjänst

17

18 of 24

19 of 24

ACTIVITY 2

10 min

20 of 24

CLASS DISCUSSION

10 min

21 of 24

Also, to think about…

Square [brackets] for substitutions
Double-check masking function you use
Save in PDF/A format
Editable and machine-readable version, such as UTF-8 TXT, DOCX/ODT, or XML, might be relevant, depending on type of research as well as repository requirements
Before depositing, check for identifiers outside the visible transcript text (next slide)

Svensk nationell datatjänst

21

22 of 24

Risk area	What to check
File metadata	Author name, institution, path names, creation history
Track changes	Deleted names or places may remain recoverable
Comments	Researcher notes may contain real identities
Embedded objects	Images, audio clips, linked files, spreadsheets
File names	Avoid Interview_Anders_Andersson_2024.docx; use P01_transcript.docx
PDF redactions	Do not just draw black boxes over text; use proper redaction/export workflows
Audio/video transcripts	Check timestamps and speaker labels for identifiers

Swedish National Data Service

22

23 of 24

Questions?

Was there anything that I didn’t cover? Or didn’t cover enough?

Swedish National Data Service

23

24 of 24

Thanks for listening

Arin Tham Savran

arin.tham@snd.se

www.researchdata.se

^{| University of Gothenburg - Chalmers University of Technology - Karolinska Institutet - KTH Royal Institute of Technology - Lund University} ^{- Stockholm University - Swedish University of Agricultural Sciences - Umeå University - Uppsala University}