1 of 57

Smarter Searches, Deeper Discovery: Integrating Natural Language Search in Primo

Matthew Hartman, Timothy Kohn

October 17th, 2025

2 of 57

Who We Are

Matt Hartman

Senior Lead, Delivery Services & Library Applications

Tim Kohn

Resource Sharing Manager

2

3 of 57

Stony Brook Libraries, AI

3

4 of 57

Stony Brook Libraries, AI Team

4

Content Services and Management

  • Associate Dean
  • Director, CSM
  • Director, Special Collections
  • Director, HSL
  • Preservation Librarian
  • Assistant Special Collections
  • Acquisitions Manager
  • Department Head
  • Library Data Scientist
  • Content Apps Analyst
  • Acquisitions Specialist
  • Technician
  • Technician
  • Technician
  • Technician

Access and Users Services

  • Associate Dean
  • Director, AUS
  • Head, Applications and Resource Sharing
  • Head, AUS
  • Head, Operations
  • Student Employment Manager
  • Resource Sharing Manager
  • NRR Supervisor
  • Main Stacks Supervisor
  • Weekend Supervisor
  • AUS Specialist
  • RS Associate
  • Evening Supervisor
  • Clerk
  • Clerk

Digital Services

  • Associate Dean
  • Director, AI
  • Director, Innovation
  • Senior Programmer
  • Supervising Programmer
  • Digital Projects Librarian
  • Senior Front-end Dev
  • Fellow, Critical AI

Academic Engagement

  • Associate Dean
  • Director, AE
  • Director, Global Initiatives
  • Head, Music Library
  • Head, Science and Engineering
  • Data Literacies Lead
  • Math & Physics Librarian
  • Health Sciences Librarian
  • Undergraduate Success Librarian
  • Academic Engagement Librarian

5 of 57

Stony Brook Libraries, AI Team

5

6 of 57

6

7 of 57

7

8 of 57

Primo Analytics

8

7.5% of searches no results

Around half of sessions have more than one search

Facet usage is startlingly low

Growing number of natural language-style questions

9 of 57

Does the World Cup promote world unity or increase nationalistic tensions?

How does expenditure affect the percentage of votes for a candidate?

During the American Revolutionary War did local militia service records show that families with more property were more likely to enlist?

How did the penalties of witchcraft get worse over time?

9

10 of 57

10

11 of 57

11

12 of 57

12

13 of 57

13

14 of 57

14

15 of 57

15

16 of 57

16

17 of 57

17

18 of 57

(From ENUG 2024) What do users need from our systems?

18

Quick, reliable discovery of resources

Quick, reliable delivery of resources

Support for information literacy

Illustration of and support for skills that can be used in other platforms

Compelling incentives to use library tools to discover & access materials purchased/licensed for their research

Low stress, high reward, navigation through the research process

Webb, H. & Babb, N. (2024, October 18) Artificial Intelligence Across the Discovery Landscape. [Conference session]. Ex Libris Northeast User Group (ENUG) 2024, , New Brunswick, NJ, United States.

19 of 57

19

Varnum, K., Kessler, R., Zhu, J., Hazen, T., Patham, B., & Holloway, J. (2025). Generative Artificial Intelligence and Web-Scale Discovery: A report from the NISO Open Discovery Initiative Standing Committee

20 of 57

20

Varnum, K., Kessler, R., Zhu, J., Hazen, T., Patham, B., & Holloway, J. (2025). Generative Artificial Intelligence and Web-Scale Discovery: A report from the NISO Open Discovery Initiative Standing Committee

21 of 57

21

Varnum, K., Kessler, R., Zhu, J., Hazen, T., Patham, B., & Holloway, J. (2025). Generative Artificial Intelligence and Web-Scale Discovery: A report from the NISO Open Discovery Initiative Standing Committee

22 of 57

Painting the picture

Focus more on…

  • Quick delivery of resources
  • Enhanced discovery of related materials
  • Support research skills & literacy
  • Incentivize using library tools
  • Librarian oversight

Focus less on…

  • Summarization
  • Circumventing the research experience
  • Separating AI from the Discovery system
  • Convoluted search process

22

We can do that!

23 of 57

23

Asks Question

Rewrites in Boolean string

Top 30 Search Results

Top 5 for user query

Knowledge

Base

Summarization

4o mini

4o mini

24 of 57

24

25 of 57

Output from Primo RA = 925,163 results, uses top 5 usable CDI results

(factors contributing to global warming) OR (causes of global warming) OR (global warming contributors) OR ((global warming) OR (climate change causes)) OR ((human activities) AND (global warming factors)) OR ("factors contributing to global warming") OR (greenhouse gases climate change global warming causes) OR (((anthropogenic) OR (human-induced) AND (global warming) OR (climate change) AND factors)) OR (carbon emissions deforestation industrialization global warming) OR (climate drivers)

OR (What factors contribute to global warming?)

25

26 of 57

26

27 of 57

27

28 of 57

28

29 of 57

29

30 of 57

30

31 of 57

31

library.edu/discovery/search?query=any,contains, [your query here] & [your default settings here]

32 of 57

Issues

  • Material information was always added to the search string
    • Material type (books, articles, etc.)
    • Creation dates (... OR “1942” OR “1943” OR …)
    • “Peer-reviewed”
    • “Held in our library”
    • “Available online”

32

33 of 57

33

&facet=tlevel,include,peer_reviewed

&mfacet=tlevel,include,online_resources,1

&mfacet=tlevel,include,available_p,1

&mfacet=searchcreationdate,include,2020%7C,%7C2025,1

34 of 57

34

Query

User natural language query moved to AI agent that parses the information.

Boolean Creation

AI agent uses training data to create a boolean string that best represents the topic of interest.

Apply Facets

Multiple AI agents work concurrently to apply relevant facets based on the input, including creation date, material type, peer-reviewed, local holdings, and online only facets.

Load Results

All of this information is brought together as a search results page using the Everything Search scope in our catalog.

Query

Topic

Date

Facets

35 of 57

35

Query

User natural language query moved to AI agent that parses the information.

Boolean Creation

AI agent uses training data to create a boolean string that best represents the topic of interest.

Apply Facets

Multiple AI agents work concurrently to apply relevant facets based on the input, including creation date, material type, peer-reviewed, local holdings, and online only facets.

Load Results

All of this information is brought together as a search results page using the Everything Search scope in our catalog.

Topic

(factor* OR cause* OR contributor* OR driver*) AND ("global warming" OR "climate change" OR "greenhouse effect")

36 of 57

36

Query

User natural language query moved to AI agent that parses the information.

Boolean Creation

AI agent uses training data to create a boolean string that best represents the topic of interest.

Apply Facets

Multiple AI agents work concurrently to apply relevant facets based on the input, including creation date, material type, peer-reviewed, local holdings, and online only facets.

Load Results

All of this information is brought together as a search results page using the Everything Search scope in our catalog.

Date

2020-2025

Recent

Format

Book

Local

37 of 57

37

Query

User natural language query moved to AI agent that parses the information.

Boolean Creation

AI agent uses training data to create a boolean string that best represents the topic of interest.

Apply Facets

Multiple AI agents work concurrently to apply relevant facets based on the input, including creation date, material type, peer-reviewed, local holdings, and online only facets.

Load Results

All of this information is brought together as a search results page using the Everything Search scope in our catalog.

library.edu /

search = BOOLEAN& date = RECENT & format = LOCAL BOOKS

38 of 57

Benefits of Agentic Model

  • Narrower focus makes the process faster and less expensive
  • Reduced guardrails on controversial topics
  • Librarian control
  • Flexibility

38

39 of 57

User Interface Considerations

  • Integrated into the existing search experience
  • Working in Primo takes patience
    • Some elements are easier to manipulate than others
  • Overlaying of search bar
    • Toggle replaces the entire search bar

39

40 of 57

40

User Interface Considerations

41 of 57

41

User Interface Considerations

42 of 57

Homepage Widget

42

43 of 57

43

44 of 57

Codebase

44

Primo

OpenAI

Web Server

45 of 57

Timing

45

Element

Runtime (s)

Non-API code runtime

.05

API calls

2-6

WebServer Response

1-2

Total

3-8

Top 10% (p90) Latency - 90% of all queries load in under 5.5s, indicating predictable system speed for nearly all users.

46 of 57

OpenAI vs AI Server

OpenAI version (GPT 4.1-mini)

    • Trained model on boolean creation.
    • High success rate in natural language queries correctly identify facets, topics, time periods, and correctly format them as boolean expressions.
    • 1 cent per ~90 searches

AI Server (llama 3.1)

    • Untrained model.
    • Reliability & repeatability isn’t at a level suitable for research use.
    • No API Costs.
    • Local power costs.

46

47 of 57

Require Sign In?

Capture more data at the cost of limiting usage.

47

48 of 57

Data

  • What do we have access to?
    • JSON capture | Processing speed | Comparative Primo analytics
  • What do we want access to / How do we evaluate this tool?
    • Based on feedback…

48

  • How did you feel about the process?
  • Do you want to use AI in this context?
  • Did you get results you needed?
  • Would you use this again?
  • Number of results (potentially misleading)
  • Session data, ie How many searches before reviewing results?

We are retaining no personal information. No gathered data is used in training.

49 of 57

API Costs

49

.04 cents per search

4.5 cents per 100 uses

~ a dollar a month

Model

Input

Output

4.1-mini

$0.40

$1.60

4.1-mini-sbubool

$0.80

$3.20

Model

Input

Output

4.1-mini

$0.41

$0.68

4.1-mini-sbubool

$0.66

$1.21

Listed Pricing

Actuals

50 of 57

Usage (10/1 - 10/14)

50

1245 Advanced

421 SEARCH AI

51 of 57

Usage (10/1 - 10/14)

51

Informational

topic-based or reference-seeking queries like:

“AI bias and facial recognition” or

“Hamlet by Shakespeare”

Exploratory

multiconcept or relational searches like:

“give me peer reviewed articles about the future of creative writing and its relationship with AI”

Creative

open-ended or speculative queries:

“How does a college student's mental health affect their grades. Has it changed since the pandemic?”

52 of 57

Revisiting the goal

Focus more on…

  • Quick delivery of resources
  • Enhanced discovery of related materials
  • Support research skills & literacy
  • Incentivize using library tools
  • Librarian oversight

Focus less on…

  • Summarization
  • Circumventing the research experience
  • Separating AI from the Discovery system
  • Convoluted search process

52

53 of 57

Limitations and User Feedback

  • No discussion or memory
  • No instruction of what prompt to use or what is possible
    • “... appears to rely on knowledge that novice researchers might not have”
  • Limited training data (in development)
  • Exact title matching
  • Some may not adjust the Boolean string to fit their needs
    • “... concerned that users would think that this is an exhaustive search and not delve deeper”
  • Time to completion could be faster

53

54 of 57

What did we learn

Future Directions

  • Underutilized search techniques
  • Sensitive relationship between training data and system prompting
  • Building systems, not just “solutions”
  • Benefits of full control over this system
  • More facets, more agents
  • Confidence scores and reinforcement
  • Chat-like instructive interface
  • Updating the UI
  • Using this system elsewhere
    • Bento search, librarian contacts, Interlibrary Loan

54

55 of 57

search.library.stonybrook.edu

56 of 57

https://github.com/Stony-Brook-Libraries-Digital-Services

Learn more about this project and more!

57 of 57

THANK YOU!�ANY QUESTIONS?

We’re looking forward to hearing from you!

matthew.hartman@stonybrook.edu

timothy.kohn@stonybrook.edu