The Partnership’s Matching Algorithms Task Force Final Report
Ian Bogus & George Machovec
Matching Algorithms Task Force Charge
Began work in Q2 2022
Reported to the Infrastructure Working Group of the Partnership for Shared Book Collections (now the Shared Print Partnership)
Early in the process commercial services were interested in the work but were mostly unwilling to share their approach to bibliographic matching due to proprietary reasons
Task Force Members
Current Members
Former Members
David Almovodar - Pace University
Claudia Conrad (Chair) - CDL
Judy Dobry - CDL
Dana Jemison - CDL
Visitors
A variety of other people from various organizations visited including from several commercial library vendors
Why Do We Care?
Shared Print Programs
Resource Sharing
Collection Analytics
Cooperative Programs
Rationale for What Algorithms to Study
Algorithmic Diversity: Each chosen algorithm represents a fundamentally different approach
*Organizational Transparency: The research prioritized algorithms from organizations willing to provide detailed insights into their approaches
*Vendor algorithms were not directly included for this reason
Algorithm types compared in study
Three types of algorithms were examined:
MARC21 Data Sets Used for Analysis
English Monographs (2013-2017)
Recent non-Roman Monographs (2013-2017)
Older English Language Monographs (pre-1950)
Algorithms Overview
Gold Rush - Bibliographic Match Key Algorithm
Shared Collection Service Bus (SCSB) - Control Number Dependent Matching
MARC-AI - Machine Learning Matching Algorithms
OCLC Primary (benchmark #1) – First instance of the 035$a
OCLC Reconciled (benchmark #2) – Leveraged the WorldCat API to find merged OCLC numbers
Methodology
Library 1
Bib Records
Library 2
Bib Records
Gold Rush
Matches
MARC AI
Matches
SCSB
Matches
OCLC Primary
Matches
OCLC Reconciled
Matches
None of the Five Matched
Matched All
Five
Island of Uncertainty
Summary Results
| English (2013-2017) | Non-Roman (2013-2017) | English (pre-1950) | |||
| # Records | # Match Groups | # Records | # Match Groups | # Records | # Match Groups |
Library 1 | 62,276 |
| 228,403 |
| 50,655 |
|
Library 2 | 54,402 |
| 108,423 |
| 18,706 |
|
All Five Matched | 36,120 | 18,051 | 59,410 | 29,676 | 3,411 | 1,705 |
Island of Uncertainty | 6,051 | 2,967 | 90,796 | 44,886 | 8,129 | 4,021 |
Scenario 1
Recent English-language monographs (2013-2017)
Common Record Issues
| True Positives | False Positives |
Gold Rush | 91.32% | 0.61% |
SCSB | 99.27% | 0.97% |
MARC-AI | 96.46% | 0.45% |
OCLC Primary | 95.05% | 0.10% |
OCLC Reconciled | 97.76% | 0.18% |
Scenario 2
Recent non-Roman-language monographs (2013-2017)
Common Record Issues
| True Positives | False Positives |
Gold Rush | 46.75% | 0.33% |
SCSB | 94.99% | 2.57% |
MARC-AI | 91.70% | 2.82% |
OCLC Primary | 87.16% | 1.59% |
OCLC Reconciled | 89.04% | 1.56% |
Scenario 3
English-language monographs (pre-1950)
Common Record Issues
| True Positives | False Positives |
Gold Rush | 76.05% | 10.62% |
SCSB | 93.58% | 6.13% |
MARC-AI | 87.21% | 2.67% |
OCLC Primary | 21.85% | 3.52% |
OCLC Reconciled | 91.19% | 1.72% |
Where to find the report
https://sharedprint.org/2025/06/13/the-matching-algorithms-task-force-releases-its-final-report/