1 of 17

Introduction to Databases

Types (Hierarchical, Relational, Non-Relational) and Indexes

Tamara Marnell, Orbis Cascade Alliance

2 of 17

In This Lesson

https://github.orbiscascade.org/systems-librarianship/databases

  1. Database Types
    1. Hierarchical
    2. Relational
    3. Non-Relational
  2. Indexes

3 of 17

Database Types

4 of 17

Hierarchical

  • Oldest and least flexible type of database
  • Trees of parents and children, like the file systems in Windows, Linux, and older Apple devices
  • Accommodates one-to-many relationships only

5 of 17

6 of 17

Relational

  • Developed at IBM in the 1970s, now popularly taught in library school classes and professional workshops
  • Data is stored in tables with predefined columns with allowed formats (text, integers, dates, sets, etc.) and individual records in rows
  • Use Structured Query Language (SQL) to manipulate and retrieve data
  • SQL statements can use unique values in the records to join tables together in complex combinations
    • This allows many-to-many relationships

7 of 17

8 of 17

Example queries

Look up information in one table based off a value in another:

SELECT repos.name, repos.url

-> FROM repos JOIN arks ON repos.id=arks.repo_id

-> WHERE arks.ark="80444/xv605573";

+--------------------------------------------+-------------------------------------+

| name | url |

+--------------------------------------------+-------------------------------------+

| Oregon Historical Society Research Library | http://ohs.org/research-and-library |

+--------------------------------------------+-------------------------------------+

9 of 17

Example queries

Get data from tables without a direct link between them:

SELECT repos.name, COUNT(DISTINCT views.ark) as arks, SUM(views.count) as total

-> FROM (repos JOIN arks ON repos.id=arks.repo_id)

-> JOIN views ON arks.ark=views.ark

-> WHERE repos.name LIKE "University of Washington %"

-> AND views.date BETWEEN "2024-10-01" AND "2024-10-31"

-> GROUP BY repos.name;

+---------------------------------------------------------+------+-------+

| name | arks | total |

+---------------------------------------------------------+------+-------+

| University of Washington Ethnomusicology Archives | 1062 | 1723 |

| University of Washington Libraries, Media Archive | 4 | 14 |

| University of Washington Libraries, Special Collections | 2848 | 14204 |

+---------------------------------------------------------+------+-------+

10 of 17

Non-Relational

  • Not all kinds of data can be efficiently stored in tables, like bibliographic and archival records with dozens of optional fields
  • Non-relational or NoSQL database models associate keys with objects
  • Key-value stores have “opaque” values. They could be JSON, XML, multimedia, etc.
  • Document databases have “transparent” structured data. The system knows what they can contain and can query them.
    • Ideal for library applications that make long, irregular records searchable!

11 of 17

Document Database Programs

  • Apache Solr is utilized by open-source Discovery projects like Blacklight, Aspen, and ArchivesSpace
  • BaseX is the XML-specific document database utilized by Archives West to index EAD finding aids and make them full-text searchable
    • Users can fetch the contents of documents with the functional programming language XQuery
  • MongoDB is another popular program for creating databases of JSON-like documents

12 of 17

Indexes

13 of 17

What are indexes?

  • Like many other technical terms (client, server, password), “index” was simply plucked from colloquial English
  • Long before computers, textbooks and other reference materials contained back-of-the-book indexes: alphabetical lists of subject headings with associated page numbers
  • Similarly, a database index is a condensed reference an application can use to quickly deliver requested information, instead of combing through large amounts of data

14 of 17

academic libraries, 42–44, 67, 109

access

alternate strategies of, 101–103

definition of, 8–9

establishing, as core competency, 7, 10, 24–25, 43–44, 103, 112–113

ownership and, 44–46

potential for, 76–77

strategic assemblages of, 43–51

vendors and, 28–37

acquisitions

archives and, 103–106

as an assemblage, 7, 12–13, 50–51, 57–58, 80, 99–103, 110–116

changing nature of, 21–24, 75–81, 103, 114–116

core competencies in, 7, 10, 24–25, 103, 112–113

definition of, 6–9

linearity and, 24, 26–27, 29, 48, 54–55, 113–115

professional standards for, 14–15, 29–30, 34–37, 77, 95

radicalization of, 15–17, 110–116

rhizomatic approach to, 115–116

role of, 15–17, 24–27, 49–51, 54–58, 72–75, 95–97, 104–105

15 of 17

Indexing in Archives West

  • The document databases of Archives West contain over 40K EAD finding aids, some of which would be hundreds or thousands of pages long if printed
  • Without indexes, searching all of those for submitted keywords, facets, etc. would take an excruciating amount of time
  • Instead, the application builds indexes:
    • A full text index that stores tokenized words or phrases and pointers to the documents they appear in
    • A brief result index that contains only the titles, abstracts, and upload dates associated with each document identifier (ARK)
    • Multiple facet indexes, which are custom document databases of subjects, persons, geographical locations, etc. and the ARKs of the documents that contain them

16 of 17

Example of a subject facet index

<terms db="18">

<term text="Advertising and Marketing">

<ark>80444/xv44537</ark>

<ark>80444/xv33883</ark>

</term>

<term text="Agriculture">

<ark>80444/xv32205</ark>

</term>

<term text="Arab Americans">

<ark>80444/xv87927</ark>

</term>

<term text="Artifacts">

<ark>80444/xv76818</ark>

</term>

<term text="Astoria">

<ark>80444/xv83142</ark>

</term>

(...)

</terms>

17 of 17

Real-world troubleshooting

Any platform with a search interface likely queries indexes rather than the source records in MARC, Dublin Core, EAD, etc.

Examples of issues that may come up:

  • Catalogers created new bibliographic records, but they're not appearing in test searches of your Discovery layer
  • Your library changed the name of a location, but the old name is still appearing in the facets/filters section of your Discovery layer
  • Your colleagues expect known records with a certain local field to come up in a keyword search, but they don't