1 of 17

Introduction to Databases

Types (Hierarchical, Relational, Non-Relational) and Indexes

Tamara Marnell, Orbis Cascade Alliance

2 of 17

In This Lesson

https://github.orbiscascade.org/systems-librarianship/databases

Database Types

Hierarchical
Relational
Non-Relational

Indexes

3 of 17

Database Types

4 of 17

Hierarchical

Oldest and least flexible type of database
Trees of parents and children, like the file systems in Windows, Linux, and older Apple devices
Accommodates one-to-many relationships only

5 of 17

6 of 17

Relational

Developed at IBM in the 1970s, now popularly taught in library school classes and professional workshops
Data is stored in tables with predefined columns with allowed formats (text, integers, dates, sets, etc.) and individual records in rows
Use Structured Query Language (SQL) to manipulate and retrieve data
SQL statements can use unique values in the records to join tables together in complex combinations

This allows many-to-many relationships

7 of 17

8 of 17

Example queries

Look up information in one table based off a value in another:

SELECT repos.name, repos.url

-> FROM repos JOIN arks ON repos.id=arks.repo_id

-> WHERE arks.ark="80444/xv605573";

+--------------------------------------------+-------------------------------------+

| name | url |

+--------------------------------------------+-------------------------------------+

| Oregon Historical Society Research Library | http://ohs.org/research-and-library |

+--------------------------------------------+-------------------------------------+

9 of 17

Example queries

Get data from tables without a direct link between them:

SELECT repos.name, COUNT(DISTINCT views.ark) as arks, SUM(views.count) as total

-> FROM (repos JOIN arks ON repos.id=arks.repo_id)

-> JOIN views ON arks.ark=views.ark

-> WHERE repos.name LIKE "University of Washington %"

-> AND views.date BETWEEN "2024-10-01" AND "2024-10-31"

-> GROUP BY repos.name;

+---------------------------------------------------------+------+-------+

| name | arks | total |

+---------------------------------------------------------+------+-------+

| University of Washington Ethnomusicology Archives | 1062 | 1723 |

| University of Washington Libraries, Media Archive | 4 | 14 |

| University of Washington Libraries, Special Collections | 2848 | 14204 |

+---------------------------------------------------------+------+-------+

10 of 17

Non-Relational

Not all kinds of data can be efficiently stored in tables, like bibliographic and archival records with dozens of optional fields
Non-relational or NoSQL database models associate keys with objects
Key-value stores have “opaque” values. They could be JSON, XML, multimedia, etc.
Document databases have “transparent” structured data. The system knows what they can contain and can query them.

Ideal for library applications that make long, irregular records searchable!

11 of 17

Document Database Programs

Apache Solr is utilized by open-source Discovery projects like Blacklight, Aspen, and ArchivesSpace
BaseX is the XML-specific document database utilized by Archives West to index EAD finding aids and make them full-text searchable

Users can fetch the contents of documents with the functional programming language XQuery

MongoDB is another popular program for creating databases of JSON-like documents

12 of 17

Indexes

13 of 17

What are indexes?

Like many other technical terms (client, server, password), “index” was simply plucked from colloquial English
Long before computers, textbooks and other reference materials contained back-of-the-book indexes: alphabetical lists of subject headings with associated page numbers
Similarly, a database index is a condensed reference an application can use to quickly deliver requested information, instead of combing through large amounts of data

14 of 17

academic libraries, 42–44, 67, 109

access

alternate strategies of, 101–103

definition of, 8–9

establishing, as core competency, 7, 10, 24–25, 43–44, 103, 112–113

ownership and, 44–46

potential for, 76–77

strategic assemblages of, 43–51

vendors and, 28–37

acquisitions

archives and, 103–106

as an assemblage, 7, 12–13, 50–51, 57–58, 80, 99–103, 110–116

changing nature of, 21–24, 75–81, 103, 114–116

core competencies in, 7, 10, 24–25, 103, 112–113

definition of, 6–9

linearity and, 24, 26–27, 29, 48, 54–55, 113–115

professional standards for, 14–15, 29–30, 34–37, 77, 95

radicalization of, 15–17, 110–116

rhizomatic approach to, 115–116

role of, 15–17, 24–27, 49–51, 54–58, 72–75, 95–97, 104–105

15 of 17

Indexing in Archives West

The document databases of Archives West contain over 40K EAD finding aids, some of which would be hundreds or thousands of pages long if printed
Without indexes, searching all of those for submitted keywords, facets, etc. would take an excruciating amount of time
Instead, the application builds indexes:

A full text index that stores tokenized words or phrases and pointers to the documents they appear in
A brief result index that contains only the titles, abstracts, and upload dates associated with each document identifier (ARK)
Multiple facet indexes, which are custom document databases of subjects, persons, geographical locations, etc. and the ARKs of the documents that contain them

16 of 17

Example of a subject facet index

</term>

</term>

</term>

</term>

</term>

(...)

</terms>

17 of 17

Real-world troubleshooting

Any platform with a search interface likely queries indexes rather than the source records in MARC, Dublin Core, EAD, etc.

Examples of issues that may come up:

Catalogers created new bibliographic records, but they're not appearing in test searches of your Discovery layer
Your library changed the name of a location, but the old name is still appearing in the facets/filters section of your Discovery layer
Your colleagues expect known records with a certain local field to come up in a keyword search, but they don't