1 of 24

SUE-duh-NIM-ih-ZAY-shun

Use it. Don’t spell it.

2 of 24

Say What?!

3 of 24

What is Pseudonymization?

  • MORE data
  • Protection of privacy
  • Preservation of non-personal attributes
  • Richer statistics
  • Better informed decision-making
  • Cousin to anonymization

4 of 24

Anonymization vs. Pseudonymization

  • Removes data
  • Protects privacy through removal of all links to a person
  • Nothing to distinguish one patron from another

  • Adds data
  • Protects privacy by cloaking a person’s identity through a pseudonym
  • Circ activities are preserved but not identified
  • Libraries can determine which properties to preserve under a pseudonym

5 of 24

Anonymization vs. Pseudonymization

From old_issues

Name: anonymous

Date enrolled: dawn of catalogue

Category Code: System

Branch: unknown

Number of checkouts: 6 bazillion

From pseudonmized_transactions

Name: $2a$10$.ZT8nRFqqL7iCkO.

Date enrolled: Jan 6, 2022

Category Code: Resident

Branch: Main

Number of checkouts: 132

6 of 24

The Ocean State Libraries Experience

7 of 24

Let’s make a plan!

Time to reinstitute data retention and patron privacy!

  1. Download historical data.
  2. Obtain board approval.
  3. Enable pseudonymization.
  4. Pseudonymize historical transactions.
  5. Edit reports.
  6. Configure the clean_up cron.

8 of 24

Download historical data.

9 of 24

Obtain board approval.

  1. Enable pseudonymization to store anonymized transactions.
  2. Completed transactions that connect patrons to books should be deleted after 16 weeks.

10 of 24

Enable Pseudonymization.

You can do it!

11 of 24

Pseudonymize historical transactions.

12 of 24

ByWater to the Rescue!

Optimize the pseudo job.

Test server hack.

13 of 24

Edit reports.

SELECT *

FROM saved_sql

WHERE savedsql LIKE ‘%statistics%’

14 of 24

Configure the clean up cron.

Borrower -/- Item after 16 weeks

  • Statistics (issue, renew)
  • Old_issues (completed checkouts)
  • Old_reserves (completed holds)
  • Message_queue (notices)
  • Other places?

15 of 24

Relax!

16 of 24

17 of 24

The right way to pseudonymize…

From the start!

With a worksheet to guide data retention discussions that can happen later.

18 of 24

Other thoughts on pseudonymization

19 of 24

Bettah!

Add borrower age bands -- juvenile, young adult, adult -- to align with IMLS definitions.

Divide the issue (checkout) transaction type into “browsing checkouts” and “filled hold checkouts” to allow for advanced statistical analysis.

20 of 24

Get buggy with it

This one is Steve’s fault

21 of 24

There will be no questions.

Are there any questions?

22 of 24

Thank you!

Valerie Burnett (she/her)

Data Migration Librarian

Bywater Solutions

valerie@bywatersolutions.com

Stephen Spohn (he/him)

Executive Director

Ocean State Libraries

sspohn@oslri.net

23 of 24

Presentation Description

Pseudonymization allows a Koha library to create a copy of circulation statistics that anonymize borrower information while retaining useful statistical details like borrower ZIP code, empowering the library to easily make more sophisticated data queries of their circulation history. Pseudonymization also supports library efforts to protect patron privacy. Libraries who choose to only keep recent transactions can rely on “pseudonymized” statistics that may be kept in perpetuity.

Ocean State Libraries chose to implement pseudonymization almost two years post-launch. OSL was finally ready to prune historical transactions and use pseudonymization to create a simple place for its members to run custom queries. However, as you’ll learn from the presenters, doing so “post-launch” created challenges for OSL and their host, ByWater Solutions.

Presenters Valerie Burnett (ByWater Solutions) and Stephen Spohn (Ocean State Libraries) will provide an overview of pseudonymization, some pros and cons, recent improvements, current bugs, and a riveting case study of implementation in a large consortium more than two years after go-live.

24 of 24

Presentation Outline

  1. What is “PS”? - V
  2. What isn’t it? - S
    1. Not about losing data!!!
  3. The OSL Experience… or what NOT to do - V, S, V
    • One day… a ticket. -V
    • Interval at OSL - Data Retention and Patron Privacy - S
    • ByWater to the Rescue - Historical pseudonymization script enhancements. - V
    • Meanwhile, back at OSL - fresh queries - S
    • ByWater to the Rescue - Test server hack. - V
    • Pseudonymized_transactions - S
    • Clean_up.py - S
      1. ByWater to the rescue - multiple crons running - V
  4. And that was a whole lot! - S
  5. The Right Way to Do It! - V
    • Turn it on… all fields.
      • Consider patron attributes.
      • When it doubt, do it.
    • Later…
    • Data retention and patron privacy
    • When ready… reach out to your provider:
      • Discussion points for the clean_up.py