1 of 24

Sensitive qualitative data and pseudonymisation

Arin Tham Savran

OS in the Swedish Context

Göteborg, 2026-04-06

| Göteborgs universitet - Chalmers tekniska högskola - Karolinska Institutet - Kungliga Tekniska högskolan - Lunds universitet - Stockholms universitet - Sveriges lantbruksuniversitet - Umeå universitet - Uppsala universitet

Detta verk är licensierat under en Creative Commons Erkännande 4.0 Internationell Licens.

2 of 24

Session plan

15:00 – 15:10 Welcome

15:10 – 15:40 Presentation

15:45 – 15:50 Activity 1

15:50 – 16:00 Presentation continues

16:00 – 16:10 Activity 2

16:10 – 16:20 Class discussion

16:20 – 16:30 Presentation finishes

3 of 24

Types of qualitative ”sensitive data collection”

  • In-depth interviews
  • Focus groups
  • Observations
  • Field notes
  • Diaries/journals
  • Audio, video, photo elicitation
  • Online forum or social media analysis
  • Open-ended questionnaries or surveys
  • Document analysis

Svensk nationell datatjänst

3

4 of 24

Sensitive data

  • Personal information: ethnicity, political opinion, union membership, religious and philosophical beliefs, health & sexual orientation, genetic info, biometrical info

  • But also other details ex. criminal history & sanctions record, and financial info

  • Geographical location of humans, animals, a physical site/entity

  • And probably some more…

Svensk nationell datatjänst

4

5 of 24

GDPR (Dataskyddsförordningen)

  • Appropriate measures must be taken to protect sensitive information…
    • Pseudonymising or anonymising are important tools

  • All research data must be stored securely (during and after project finishes) using technical and organisational measures
    • University Data Office, IT, and Archive services usually help with this

  • Remember to ask for help if you don’t know how!

Svensk nationell datatjänst

5

6 of 24

Pseudonymisation & Anonymisation: �In a nutshell

  • Create codes or aliases

  • Use a code key or log

  • Subject to GDPR as long as the code key exists

  • If code key is deleted, data is anonymised and GDPR not applicable. But…

Svensk nationell datatjänst

6

7 of 24

Here’s the thing though…

  • Swedish Archive's Act

Svensk nationell datatjänst

7

8 of 24

RE-IDENTIFICATION �Indirect identifiers and background variables

  • Gender
  • Age
  • Income
  • Occupation
  • Municipality of residence
  • And so forth…

Svensk nationell datatjänst

8

9 of 24

1. Remove the ”obvious” and generalise identifying details

  • Removing or masking: �”Hello, my name is Anders Andersson!” → ”Hello, my name is Anders Andersson!”
  • Generalising: �”I live in Kalix” →. ”I live in a small town in northern Sweden”. �Though perhaps masking more suitable?

Make sure there’s nothing that could identify individuals, study participant or otherwise! Including via re-identification…

Exceptions? Yes, in some cases people want to be identified. But up to PI to ultimately judge the appropriateness!

Svensk nationell datatjänst

9

10 of 24

2. Create code/alias key

Use fake names, participant ID’s, or role-based labels.

Also, keep the key separately from the research data, stored securely, and accessible only to authorised individuals.

Ex:

Participant ID

Real name

Contact details

P001

Jane Doe

Email/phone

Svensk nationell datatjänst

10

11 of 24

3. Pseudonyms for roles, organisations, geographical regions

  • [Teacher 1]
  • [Teacher 2]
  • [School 1]
  • [School 2]
  • [Region 3]
  • [University]
  • [major newspaper]
  • [municipality]

Svensk nationell datatjänst

11

12 of 24

4. Pseudonymise places, organisations/companies, and networks

Original:

“I worked at Volvo Torslanda under my foreman Anders at T3.”

Replace with something like:

“I worked for a large manufacturing company in a major Swedish city under a senior manager.”

Svensk nationell datatjänst

12

13 of 24

5. Modify dates

Dates can be identifying, especially around incidents, complaints, hospitalisation, legal proceedings, or media-covered events.

Source: ChatGPT Edu

Original

Safer version

“on 17 September 2023”

“in autumn 2023” or ”2023”

“three days after the public inquiry”

“shortly after a major public event”

“during the 2020 election campaign”

“during a national campaign period”

Svensk nationell datatjänst

13

14 of 24

6. Generalise and categorise background information

Examples:

  • converting age to age brackets
  • recoding municipality into county
  • classifying income as low, medium, or high
  • recoding political affiliation as right or left
  • grouping occupation or workplace into public or private sector

Source: www.researchdata.se

Svensk nationell datatjänst

14

15 of 24

ACTIVITY 1

3-5 min

16 of 24

ACTIVITY 1: Pseudonymised examples from ChatGPT

Original

Pseudonymised

“I live in Kiruna”

“I am a 47-year-old neurosurgeon”

“My son attends Greenfield Primary”

“I started on 12 March 2024”

“My research is on hygiene practices in post-WW2 Finland and Sweden”

“As the only Arabic-speaking pediatric nurse in town…”

Svensk nationell datatjänst

16

17 of 24

Pseudonymised examples from ChatGPT (cont.)

Original

Pseudonymised

“I live in Kiruna”

“I live in a small town in northern Sweden”

“I am a 47-year-old neurosurgeon”

“I am a mid-career specialist doctor”

“My son attends Tullbro School”

“My child attends a local primary school”

“I started on 12 March 2024”

“I started in early 2024”

“My research is on hygiene practices in post-WW2 Finland and Sweden”

“My research is on public health practices in two Nordic countries during the mid-20th century”

“As the only Arabic-speaking pediatric nurse in town…”

“As one of few multilingual pediatric healthcare workers in the area…”

Svensk nationell datatjänst

17

18 of 24

Read more…

Svensk nationell datatjänst

18

19 of 24

ACTIVITY 2

10 min

20 of 24

CLASS DISCUSSION

10 min

21 of 24

Also, to think about…

  • Square [brackets] for substitutions
  • Double-check masking function you use
  • Save in PDF/A format
  • Editable and machine-readable version, such as UTF-8 TXT, DOCX/ODT, or XML, might be relevant, depending on type of research as well as repository requirements
  • Before depositing, check for identifiers outside the visible transcript text (next slide)

Svensk nationell datatjänst

21

22 of 24

Risk area

What to check

File metadata

Author name, institution, path names, creation history

Track changes

Deleted names or places may remain recoverable

Comments

Researcher notes may contain real identities

Embedded objects

Images, audio clips, linked files, spreadsheets

File names

Avoid Interview_Anders_Andersson_2024.docx; use P01_transcript.docx

PDF redactions

Do not just draw black boxes over text; use proper redaction/export workflows

Audio/video transcripts

Check timestamps and speaker labels for identifiers

Swedish National Data Service

22

23 of 24

Questions?

  • Was there anything that I didn’t cover? Or didn’t cover enough?

Swedish National Data Service

23

24 of 24

Thanks for listening

Arin Tham Savran

arin.tham@snd.se

www.researchdata.se

| University of Gothenburg - Chalmers University of Technology - Karolinska Institutet - KTH Royal Institute of Technology - Lund University - Stockholm University - Swedish University of Agricultural Sciences - Umeå University - Uppsala University