1 of 47

Recopilando métricas y estableciendo límites: Reutilización de colecciones como datos y el impacto de la Inteligencia Artificial.

Gathering Metrics and Setting Boundaries: Reusing Collections as Data and the impacts of AI

12-14 November 2024, Costa Rica and Online

RDA 23rd Plenary Meeting (RDA P23) | Sustainable Science

RDA 23rd Plenary Meeting

www.rd-alliance.org

2 of 47

Welcome new RDA members!

6 Guiding Principles are at the heart of the RDA community

OPENNESS

CONSENSUS

HARMONISATION

COMMUNITY-DRIVEN

NON-PROFIT AND TECHNOLOGY-NEUTRAL

INCLUSIVITY

RDA 23rd Plenary Meeting

www.rd-alliance.org

3 of 47

Recopilando métricas y estableciendo límites: Reutilización de colecciones como datos y el impacto de la Inteligencia Artificial.

Gathering Metrics and Setting Boundaries: Reusing Collections as Data and the impacts of AI

4 of 47

Moderadoras de la Reunión | Session Moderators

Trabajos previos sobre colecciones como datos.

Trabajos actuales sobre inteligencia artificial responsable.

Prior work on Collections-as-Data.

Current work on Responsible AI.

Hannah Scates Kettler, Associate University Librarian for Academic Services

Iowa State University, USA

Yasmeen Shorish, Director of Scholarly Communications Strategies

James Madison University, USA

5 of 47

Grupo de interés Colecciones como datos | Collections as Data Interest Group

Este grupo está dirigido a profesionales de colecciones que desempeñan una variedad de funciones críticas: como expertos en garantizar el acceso, la preservación y la reutilización de registros, objetos, datos y colecciones digitales; como provocadores de buenas prácticas de curación de colecciones; y como defensores de la construcción de infraestructuras responsables y sostenibles para el intercambio de información.

Reconocemos que existe una presión cada vez mayor sobre las instituciones de la memoria para que establezcan modelos para el desarrollo responsable y culturalmente sensible de los recursos de datos mientras exploramos los desafíos y oportunidades que brindan las nuevas tecnologías (Declaración de Vancouver, Principios 2, 4, 9-11).

Este grupo busca proporcionar un espacio para examinar las alineaciones en valores, políticas y prácticas en el trabajo de colecciones, fomentando un intercambio de experiencia en áreas y dominios de práctica de recolección.

This group is aimed at collections professionals who serve in a range of critical roles: as experts in ensuring access, preservation, and reuse of digital records, objects, data, and collections; as provocateurs for good collections curation practices; and as advocates for the construction of responsible and sustainable infrastructures for information sharing.

We recognise that there is increasing pressure on memory institutions to establish models for responsible and culturally sensitive development of data resources while navigating the challenges and opportunities provided by new technologies (Vancouver Statement, Principles 2, 4, 9-11).

This group seeks to provide a space for examining alignments in values, policy, and practice in collections work, and encouraging an exchange of expertise across collecting areas and domains of practice.

6 of 47

Cómo unirse | How to join

Encuéntranos en https://www.rd-alliance.org/groups/collections-as-data-ig/

Find us at https://www.rd-alliance.org/groups/collections-as-data-ig/

7 of 47

Objetivos | Goals

A través de diversas presentaciones y discusiones, esta sesión tendrá como objetivo proporcionar información sobre cómo las instituciones están evaluando el uso de colecciones como datos y cómo la Inteligencia Artificial afecta los parámetros de este uso.

Through presentations and discussions, this session will aim to provide insight into how institutions are assessing use of collections as data and how AI impacts the parameters of that use.

8 of 47

Orden del Día | Agenda

Presentaciones:

IA responsable
Comunidad de International GLAM Labs
D-CRAFT
International GLAM Labs Evolución
Detección de idiomas, modelos de lengua y AI en repositorios institucionales

Preguntas o comentarios

Discusión

Presentations:

Responsible AI
International GLAM Labs Community
D-CRAFT
International GLAM Labs Evolution
Language detection, LLMs and AI in institutional repositories

Q&A

Discussion

9 of 47

Responsible AI:

Tools for values-driven AI in libraries and archives

IA responsable:

Herramientas para una IA basada en valores en bibliotecas y archivos

9

10 of 47

Equipo del proyecto | Project Team

10

Hannah Scates Kettler Iowa State University

Sara Mannheimer (PI)

Montana State University

Jason A. Clark

Montana State University

Bonnie Sheehey

Montana State University

Yasmeen Shorish

James Madison University

Doralyn Rossmann Montana State University

Scott W. H. Young

Montana State University

Natalie Bond

University of Montana

11 of 47

Objetivos del proyecto | Goals of the project

Develop practical resources that support responsible use of AI in library and archives contexts. �
Aid library and archives practitioners consider our values and the ethical implications as they embark on new AI projects

Desarrollar recursos prácticos que apoyen el uso responsable de la IA en contextos de bibliotecas y archivos.�
Los profesionales de bibliotecas y archivos de ayuda consideran nuestros valores y las implicaciones éticas a medida que se embarcan en nuevos proyectos de IA

12 of 47

12

https://osf.io/yue9s

13 of 47

Buscar actualizaciones: | Check for updates:

lib.montana.edu/responsible-ai/

13

14 of 47

| International GLAM Labs Community

Una comunidad de instituciones culturales y de investigación innovadoras del mundo�84 instituciones y 354 miembros (a 7/11/2024)

A community of the world's innovative cultural and research institutions �84 institutions and 354 members (as of 7/11/2024)

GLAMLABS.IO/MEMBER-MAP/

Comunidad internacional GLAM Labs

15 of 47

Miembros | Members

Katrine Hofmann Gasser - Royal Danish Library
Mahendra Mahey - Tallinn University
Olga Holownia - International Internet Preservation Consortium
Nele Gabriëls - KU Leuven
Sally Chambers - British Library & DARIAH-EU
Milena Dobreva - University of Strathclyde
Sarah Ames - National Library of Scotland
Alba Irollo - Europeana
Abby Potter - Library of Congress
Ellen Van Keer - Meemoo
Caleb Derven - University of Limerick
Armin Straube - University of Limerick
Rania Osman - Bibliotheca Alexandrina
Liam Green-Hughes - British Library
Neil Fitzgerald - British Library
Tim Sherratt - GLAM Workbench
Gustavo Candela - University of Alicante
…

16 of 47

Objetivos | Goals

Experiment and innovate on-site and on-line with digitised and born digital collections and data
Promote the publication and responsible use of digital collections published by GLAM institutions
Share best practices and guidelines

Experimentar e innovar in situ y en línea con las colecciones digitalizadas y digitales y datos
Promover la publicación y reutilización responsable de colecciones digitales en instituciones GLAM
Compartir buenas prácticas y guías

17 of 47

Eventos | Events

SEPTIEMBRE 2018 �BUILDING LIBRARY LABS: 1º EVENTO�BRITISH LIBRARY, LONDRES

MARZO 2019 �BUILDING LIBRARY LABS: 2º EVENTO�ROYAL DANISH LIBRARY, COPENHAGUE

SEPBRE 2019 TIEM�DIGITAL CULTURAL HERITAGE �INNOVATION LABS BOOKSPRINT�DOHA, QATAR

OCTUBRE- 2019 - ABRIL 2024 �EN LÍNEA

MAYO 2024

TALLER GLAM LABS EN DHNB�NATIONAL AND UNIVERSITY LIBRARY OF ICELAND

SEPTEMBER 2018 �BUILDING LIBRARY LABS: 1ST EVENT�BRITISH LIBRARY, LONDON

MARCH 2019 �BUILDING LIBRARY LABS: 2ND EVENT�ROYAL DANISH LIBRARY, COPENHAGEN

SEPTEMBER 2019 �DIGITAL CULTURAL HERITAGE �INNOVATION LABS BOOKSPRINT�DOHA, QATAR

OCTOBER- 2019 - APRIL 2024 �ONLINE

MAY 2024

GLAM LABS WORKSHOP AT DHNB�NATIONAL AND UNIVERSITY LIBRARY OF ICELAND

18 of 47

Publicaciones | Publications

https://glamlabs.io/books/�7 idiomas (inglés, español, griego, ruso, búlgaro, serbio y árabe)

7 languages (English, Spanish, Greek, Russian, Bulgarian, Serbian and Arabic)

Talleres | Workshops

Acceso computacional | Computational access

https://glamlabs.io/checklist

https://doi.org/10.1002/asi.24835

19 of 47

Encuéntranos | Find us

https://www.jiscmail.ac.uk/GLAMLABS

https://twitter.com/GLAM_labs

https://glamlabs.io/slack-channel-signup/

20 of 47

Assessing Use and Reuse

of Digital Collections Using D-CRAFT

21 of 47

Esquema de presentación | Presentation Outline

Información de fondo
Prácticas recomendadas
Directrices éticas para la reutilización
Módulos educativos
Conclusión

Background
Recommended Practices
Ethical Guidelines for Reuse
Education Modules
Conclusion

22 of 47

Background

Información de fondo

23 of 47

Avanzando en la evaluación a través de D-CRAFT|

Advancing Assessment through D-CRAFT

Digital Content Reuse Assessment Framework Toolkit (D-CRAFT)
Developed by the DLF AIG Content Reuse Working Group
IMLS National Leadership Grants for Libraries - LG-36-19-0036-19
July 2019 - June 2023
Ethical guidelines, recommended practices, and assessment training

Digital Content Reuse Assessment Framework Toolkit (D-CRAFT)
Desarrollado por el Grupo de Trabajo de Reutilización de Contenido de DLF AIG
Subvenciones IMLS de Liderazgo Nacional para Bibliotecas - LG-36-19-0036-19
Julio 2019 - Junio 2023
Directrices éticas, prácticas recomendadas y formación en evaluación.

24 of 47

Equipo D-CRAFT | D-CRAFT Team

Equipo del proyecto/Project Team

Elizabeth Joan Kelly
Ayla Stein Kenfield
Ali Shiri
Santi Thompson
Liz Woolcott

Consultores/Consultants

Joyce Chapman
Nicole Hennig
Derrick Jefferson
Ranti Junus
Myrna E. Morales

Grupo Asesor/Advisory Group

Paige Dansinger
LaToya Devezin
Genya O’Gara
Anna Naruta-Moya
Kelly Riddle
Kayla Siddell
Holly Smith

https://reuse.diglib.org/

Santi

There are several groups that have contributed to the project.
First are the project team members, who collaboratively designed the project and work closely with other groups to build, test, and ultimately deploy the toolkit
We have also benefited from the contributions of five D-CRAFT consultants.

Consultants add additional expertise in key themes including library assessment, privacy, diversity and equity, instructional design, and accessibility

The D-CRAFT Advisory Group (AG) is made up of experts representing the breadth of the GLAMR community. The group provides guidance and feedback on all aspects of the D-CRAFT project, including our methods and deliverables.
Finally, we hired 9 experts to assist with specialized areas of knowledge for the grant
The complete bios for group members are available on the D-CRAFT project website

25 of 47

Recommended Practices

Prácticas Recomendadas

26 of 47

8 métodos + herramientas para la recopilación de datos|

8 methods + tools for data collection

Métodos de recopilación de datos/

Data Collection Methods

Alert Services
Altmetrics
Citation Analysis
Focus Group/Interview
Link Analysis
Point of Use/Survey
Reverse Image Search
Web Analytics

Herramientas de recopilación de datos seleccionadas/

Selected Data Collection Tools

Mention
PlumX
Google Scholar
Webometric Analyst
Qualtrics
TinEye Reverse Image Search
Matomo

27 of 47

Ethical Guidelines

Directrices éticas

28 of 47

Directrices éticas - Secciones|

Ethical Guidelines - Sections

Introduction, Scope, Terminology, Core values, Further Reading
Core Values

IDEAS - Inclusion, Diversity, Equity, Accessibility, Social Justice
Privacy
Traditional Knowledge, Cultural heritage, and intellectual property
Professional Development and Training
Transparency
Awareness

Every core value:

Description of the value
Practical considerations

Introducción, Alcance, Terminología, Valores fundamentales, Lectura adicional
Valores fundamentales

IDEAS - Inclusión, Diversidad, Equidad, Accesibilidad, y Justicia social
Privacidad
Conocimiento tradicional, patrimonio cultural, and propiedad intelectual
Desarrollo y formación profesionales
Transparencia
Conciencia

Cada valores fundamentales:

Description of the value
Consideraciones prácticas

29 of 47

Módulos educativos | Education Modules

30 of 47

31 of 47

| Evolving GLAM Labs�| LABORATORIOS GLAM EN EVOLUCIÓN

Mahendra Mahey�Universidad de Tallin, Museo Nacional de Estonia & Universidad de Strathclyde	Mahendra Mahey�Tallinn University, Estonian National Museum & �University of Strathclyde

1. Dirigió los Laboratorios de la Biblioteca Británica (2013-2021), casi nueve años	1. Ran British Library Labs (2013 - 2021), nearly nine years

2. Primer Laboratorio de Becas Digitales en una Biblioteca Nacional - para reutilizar y remezclar colecciones y datos digitales - ahora «servicio habitual».	2. First Digital Scholarship Lab in a National Library - for reusing and remixing digital collections and data - now ‘business as usual service’

3. Ejecutó los primeros experimentos de aprendizaje automático / IA con datos de 2013 en adelante.....	3. Ran early machine learning / AI experiments on data from 2013 onwards…

4. Apoyo y presentación de más de 450 proyectos con investigadores, artistas, empresarios, educadores, comunidades y personal	4. Supported/showcased over 450 projects with researchers, artists, entrepreneurs, educators, communities and staff

5. Concursos (residencias), proyectos, premios, exposiciones, programas educativos, laboratorios de ideas, exposiciones itinerantes y talleres de datos, Algorave, stand up, creación de redes y puesta en común.	5. Competitions (Residences), Projects, Awards, Exhibitions, Educational Programmes, Ideas Labs, Roadshows, Data Workshops, Algorave, Stand up, Networking and Sharing

6. Contribución a la creación de la comunidad GLAM Labs	6. Helped establish the GLAM Labs community

7. Lanzamiento del portal «colecciones como datos» (2015) con DOIs	7. Launched ‘collections as data’ portal (2015) with DOIs

8. Asesor principal de investigación y desarrollo en la Universidad de Tallin, redactor de proyectos, gestor de proyectos y red de proyectos	8. Senior Research & Development Adviser at Tallinn University, project writer, project manager and project network

Estonia

32 of 47

| PhD - Evolving GLAM Labs�| Doctorado - Laboratorios GLAM en evolución

Doctorando (24 oct-27 sep) Supervisores: Dra. Milena Dobreva y Profesor Ian Ruthven Departamento de Informática y Ciencias de la Información	PhD candidate (Oct 24-Sep 27) �Supervisors: Dr Milena Dobreva and Professor Ian Ruthven�Department of Computer and Information Science

¿Qué lecciones podemos aprender de la creación, el impacto y la evolución de los Laboratorios de Galerías, Bibliotecas, Archivos y Museos (GLAM)?	What lessons can be learned from the establishment, impact and evolution of Gallery, Library, Archive and Museum (GLAM) Labs?

1. Relatos personales, entrevistas, estudios de casos, encuestas, investigación y análisis	1. Personal accounts, interviews, case studies, surveys, research and analysis

2. Realizadas 26 entrevistas hasta la fecha con aquellos cuyos proyectos reutilizaban y remezclaban el patrimonio digital y personas que prestaban servicios a los usuarios en entornos GLAM Lab (desde 2013).	2. Conducted 26 interviews so far with those whose projects reused and remixed digital heritage and people providing services for users in GLAM Lab settings (since 2013)

3. Analizar y buscar patrones, impacto a largo plazo, sostenibilidad, ciclos de vida, evolución, uso e impacto de la IA	3. Analyse and look for patterns, long term impact, sustainability, life cycles, evolution, use of and impact of AI

4. Proporcionar recomendaciones para que los GLAM aumenten el uso del patrimonio digital de forma impactante y hagan crecer nuevos GLAM Labs	4. Provide recommendations for GLAMs to increase use of digital heritage impactfully and to grow new GLAM Labs

5. Póngase en contacto con: mahendra.mahey@strath.ac.uk	5. Contact: mahendra.mahey@strath.ac.uk

Glasgow, Scotland�United Kingdom

33 of 47

| Digital cultural heritage as a social resource�| El patrimonio cultural digital como recurso social

Investigador Junior (Ene 2024 - Dic 2026) Investigador principal Pille Runnel	Junior Researcher (Jan 2024 - Dec 2026) �Principal Investigator Pille Runnel

El patrimonio cultural digital como recurso social	Digital cultural heritage as a social resource

1. Trazar un mapa de las políticas actuales y las experiencias de los usuarios en relación con el uso del patrimonio cultural digital. Entrevistas/encuestas para comprender la situación actual e identificar posibles cuellos de botella en los que se necesite apoyo y nuevas soluciones para aumentar la aceptación del patrimonio digital como recurso social.	1. Map current policies and user experiences related to the use of digital cultural heritage. Interviews / surveys to understand current situation and identify possible bottlenecks where support and new solutions are needed to increase the uptake of digital heritage as a social resource

2. Desarrollar la capacidad del sector del patrimonio para utilizar el patrimonio digital y ampliar el uso de métodos de cocreación para dar sentido al compromiso con el patrimonio digital y transferir conocimientos sobre el patrimonio.	2. Develop the heritage sector's capacity to use digital heritage and to expand the use of co-creation methods to make sense of engaging with digital heritage and to transfer knowledge about heritage

3. Si conoce algún uso del patrimonio cultural digital que haya tenido repercusión social, póngase en contacto conmigo:�mahendra.mahey@strath.ac.uk	3. If you know any uses of digital cultural heritage that have led to social impact, please contact me:�mahendra.mahey@strath.ac.uk

34 of 47

Detección de idiomas para repositorios institucionales: inteligencia artificial y modelos de lenguaje

Language detection for institutional repositories: artificial intelligence and language models

35 of 47

Presentación del problema / Issue

El enorme volumen de recursos almacenados en los repositorios digitales representa una gran dificultad a la hora de supervisar y corregir errores o mejorar la calidad de los metadatos. Nos enfocamos en la corrección del metadato idioma en los registros de resúmenes del repositorio institucional SEDICI.

The huge volume of resources stored in digital repositories represents a great difficulty when it comes to monitoring and correcting errors or improving the quality of metadata. We focused on correcting the metadata for language in the abstract records of the SEDICI institutional repository.

36 of 47

Datos y metodología / Data and methodology

A partir de un dataset exportado del repositorio de unos 126.081 ítems se planificó una tarea de detección automática de idiomas utilizando diferentes librerías existentes compatibles con el método zero-shot (langdetect, CLD3, fastText, Polyglot, langid y TextCat). Luego se compararon los resultados obtenidos con los datos de los idiomas registrados por el personal de catalogación del repositorio. Para tratar de mejorar aún más la detección de idiomas se entrenó un modelo mBERT multilenguaje y se comparó su desempeño con el conjunto más pequeño de ítems cuya clasificación por idiomas era diferente entre humanos y la biblioteca Polyglot.

We explored a dataset of about 126,081 items of the repository. An automatic language detection task was planned using different existing libraries compatible with the zero-shot method (langdetect, CLD3, fastText, Polyglot, langid and TextCat). The results obtained were compared with the language data recorded by the repository's cataloging staff. In order to further improve language detection, a multilingual mBERT model was trained and its performance was compared with the smaller set of items whose language classification was different between humans and the Polyglot library.

37 of 47

Resultados/Results

En general, todas las librerías de detección de idiomas mostraron alrededor de un 95% de coincidencia con los idiomas identificados y catalogados por los humanos. En el caso de los modelos mBERT entrenados las coincidencias obtenidas son bajas tanto para los idiomas detectados automáticamente por Polyglot como los catalogados por humanos (37,2% y 18% respectivamente). Se encontraron errores de catalogación atribuibles a humanos pero también errores de las bibliotecas o de los modelos de lenguaje en la tarea de detección.

Overall, all language detection libraries showed around 95% agreement with the languages identified and catalogued by humans. In the case of the trained mBERT models, the agreement obtained is low both for the languages automatically detected by Polyglot and those catalogued by humans (37.2% and 18% respectively). There were errors in cataloging that could be made by humans but also errors from the libraries or language models in the detection task.

38 of 47

Necesidad de supervisión humana/Human supervising needed

Muchas de las librerías han demostrado fallar en la detección, inclusive de los idiomas mayoritarios, cuando el texto del resumen está compuesto por un listado de palabras o frases.
Una tarea importante que resta realizar pero que requerirá la intervención de etiquetadores humanos es la de re etiquetar el porcentaje de resúmenes que no cuentan con el campo de idioma y definir, cuál es la opción correcta en los casos en los que las bibliotecas y modelos no coincidieron con el idioma catalogado.
Este trabajo se presentó en BIREDIAL-ISTEC 2024. XIII Conferencia Internacional sobre Bibliotecas y Repositorios Digitales Del 22 al 24 de octubre de 2024. Es muy inicial…

Many libraries have been shown to fail to detect even the majority languages when the abstract text is composed of a list of words or phrases.
An important task that remains to be done but will require the intervention of human taggers is to re-label the percentage of abstracts that do not have the language field and to define the correct option in cases where the libraries and models did not match the cataloged language.
This work was presented at BIREDIAL-ISTEC 2024. XIII International Conference on Digital Libraries and Repositories From October 22 to 24, 2024. More research needs to be done…

39 of 47

En conclusión/To conclude

Los modelos de lengua necesitan siempre de la revisión humana: será necesario desarrollar una herramienta de interacción con catalogadores (probablemente se requiera de más de un humano para controlar los datos) que permita volver a clasificar alrededor del 5% de los ejemplos que conforman el subconjunto de datos en los que la catalogación y la detección no coincidieron. Solo una vez que se tenga la etiqueta de idioma correcta en todos los resúmenes se podrá evaluar con total certeza el desempeño de las herramientas utilizadas.
Antes de usar estas herramientas para el público en bibliotecas es mejor probarlas en proyectos internos, ya que aún hay mucho por hacer. Este proyecto nos invita a ser más cuidadosos en la aplicación de estas tecnologías frente al nuestro público.

Language models always need human review: a tool for interaction with cataloguers (probably requiring more than one human to check the data) will need to be developed to allow reclassification of about 5% of the examples that make up the subset of data where cataloging and detection did not match. Only once the correct language tag is available for all abstracts can the performance of the tools used be evaluated with complete certainty.
Before using these tools for the public in libraries, it is best to test them in internal projects, as there is still much to do. This project invites us to be more careful in the application of these technologies in our open and public resources.

40 of 47

¡Gracias! | Thank you!

41 of 47

Q&A

42 of 47

Discusión | Discussion

¿Existen métricas diferentes para las colecciones digitalizadas y las colecciones como datos?

Are there different metrics for digitised collections vs collections-as-data?

43 of 47

Discusión | Discussion

¿Cómo influye el uso de la IA en las políticas y prácticas institucionales?

How does AI usage inform institutional policies and practices?

44 of 47

Discusión | Discussion

¿Por qué debemos aún seguir explorando internamente estas métricas antes de abrirlas al público?

Why should we be internally exploring these metrics instead of making them available to the public?

45 of 47

Discusión | Discussion

¿Qué métricas se utilizan en vuestra institución para medir la reutilización de las colecciones como datos y el impacto de la IA?

What metrics are used in your institution to measure the reuse of collections as data and the impact of AI?

46 of 47

Discusión | Discussion

¿Cuáles son las iniciativas o buenas prácticas que usáis para promocionar el uso responsable de las colecciones digitales?

What are the initiatives and best practices that you use to foster the responsible use of the digital collections?

47 of 47

THANK� YOU

RDA 23rd Plenary Meeting

www.rd-alliance.org