1 of 99

AI: �Where You Can Use It �and Where You Shouldn't

Elizabeth Bartmess

elizabeth.bartmess@gmail.com

Indexing Society of Canada / Société canadienne d’indexation

2025 Vancouver Conference

2 of 99

AI-Generated “Indexes” are Crap �And We Need to Say That�To Authors, Publishers, �and Prospective Indexers

My new title

3 of 99

What I’ll cover

How AI is bad at book indexing
Why AI is bad at book indexing
Our actual problem with AI: potential client misconceptions
What we can do about client misconceptions
Postponed: Using AI for macros, scripts, and regular expressions

4 of 99

What I mean by AI

Generative AI, specifically chatbots that use Large Language Models (LLMs)

ChatGPT, Claude, Copilot, etc.
I will try to use the term LLMs to be specific

“AI” is a broader category than this
Older uses of “AI” tend to get subsumed into ”not AI anymore”

5 of 99

How LLMs are bad at indexing

6 of 99

I will:

Briefly recap Tanya Izzard’s (2024) findings
Present my results with full-length out-of-copyright books

7 of 99

A quick ethics note

Authors are often wary of LLMs
Some publishers stipulate that we should not use AI
If experimenting:

Don’t upload a client’s work to an LLM without the explicit permission of the author and publisher
Use out-of-copyright works, e.g. from archive.org

8 of 99

Tanya Izzard (2024)^*

Various indexing-related tasks:

Text summary
Keyword extraction
Proper-name extraction
Draft index creation
Research tasks

A history of England (~11-page work by Jane Austen)
Two LLMs: Claude and Adobe AI Assistant
Prompt for draft index creation: "You are a book indexer. Please create an index with thirty entries, showing correct page numbers, for this text."

^*Izzard, T. "Generative Artificial Intelligence (AI) and Its Performance at Indexing Tasks." The Indexer 42, no. 4 (2024). https://doi.org/10.3828/index.2024.24.

9 of 99

Izzard (2024): draft index creation

Adobe AI Assistant: supplied a names list
Claude:

Incorrectly claimed document lacked page numbers
Made up page numbers
Overly specific headings (“Cromwell, Oliver, as villain”)
Displayed page numbers as, e.g., ”[p.8]”
Used subheadings needlessly
No cross-references (but it was an ~11-page document)

10 of 99

My attempts

Full-length, public-domain books with existing indexes

PDFs with OCR (Optical Character Recognition = scanned books with text)

ChatGPT:

The Behavior of Organisms, by B.F. Skinner
How to Stop Worrying and Start Living, by Dale Carnegie

Perplexity.AI with Deep Research Mode:

The Psychology of Day-Dreams, by J. Varendonck

Claude with Opus:

The Psychology of Day-Dreams, by J. Varendonck, minus the last ~60 pages

General prompt: “Please create a professional-quality index for the attached book, using the same techniques that a professional indexer would use (e.g., including cross-references, subheadings, etc). Ignore the book's existing index; instead, create your own.”

11 of 99

For Perplexity and Claude

Perplexity.AI’s Deep Research Mode: recommended by a friend who trains AI to work with medical records
Claude with Opus: at the time, Claude’s most expensive model for tasks that required putting together disparate pieces of information and making extended inferences
Used a different book than for ChatGPT due to tighter file size limits
Included a copy of the Criteria for the ASI Excellence in Indexing Award
Prompt: The same as for ChatGPT, plus “Please ensure that the completed index adheres to the guidelines in the attached file.”
Had to cut off last ~60 pages with Claude, so omitted line about ignoring the existing index

12 of 99

What I will and won’t talk about

Will talk about big issues:

Number of main headings (access points)
Structure: metatopic/supermain/main headings
Structure: Cross-references
Hallucinations

Won’t talk about relatively easy fixes:

Formatting
Alphabetization
Failure to detect page numbers

13 of 99

Main headings

A good index typically has a main heading (access point) for every piece of indexable material that can stand on its own

Subject to length constraints

Names of people and titles of works

For scholarly works especially, often indexed fairly inclusively
We generally do not omit nearly all names and titles

14 of 99

The Behavior of Organisms

- Adaptation, 14, 16

- After-discharge, 12, 14

- Algebraic summation, 30

- Analytical units in behavior, 9

- Animal behavior, 4, 5

- Appetite, 22, 23

Original index (A’s)

ChatGPT’s index (A’s)

15 of 99

How to Stop Worrying and Start Living

Acceptance, power of, 19, 20, 23

Action, importance of, 6, 7, 17, 48

Adversity, dealing with, 129-136

Alvarez, Dr. W. C., on worry and ulcers, 26

Anger, cost of, 113-122

Anxiety, overcoming, 7, 9, 16, 20, 151

Original index (A’s)

ChatGPT’s index (A’s)

16 of 99

The Psychology of Day-Dreams:�Original index (A’s)�

17 of 99

The Psychology of Day-Dreams�

Absurdities

- in day-dreams

- in thought chains

- types and causes

- See also Errors in thinking

Affect

- and apperception

- and conception

- and ideation

- and intuition

- and memory

- and thinking processes

- and visualization

- role in day-dreams

- See also Emotions; Feeling processes

Apperception

- affect in relation to

- compared to perception

- in day-dream formation

- role in thought chains

Attention

- conscious

- directed vs. undirected

- fore-conscious

- in day-dream states

- relationship to awareness

Autistic thinking

- Bleuler's concept of

- characteristics

- compared to directed thinking

- comparison with day-dreams

- function in mental life

- in neurotics

- relationship to reality

- vs. logical thinking

- See also Non-directed thinking; Phantastic thinking

Awareness

- degrees of

- during day-dreams

- relationship to consciousness

- See also Consciousness

Perplexity.AI’s index (A’s)

18 of 99

The Psychology of Day-Dreams�

A

affects

active in fore-conscious thinking, 4, 197-217

affinity for attention, 207

and apperception, 218-247, 252-253

and automatic ideation, 191

and conception, 248-257

and conscious ideation, 207

and forgetting, 140-153

and hallucination, 152

and ideation, 248-257

and memory, 183-217, 252

and perception, 241-242

and recollecting, 192-193, 197, 204, 211

and repression, 201, 292-303

and thought-formation, 250-251

and wish-fulfilment, 76

at genesis of day-dreams, 188-191

causing distraction, 133, 148-149

differentiation of, 247

directing concatenations, 112

effaced by evolution, 245

in artistic thinking, 216

in conception, 253-256

intensity of, 81, 256

interference with consciousness, 204

intuition of, 279-292

role in thinking, 259

spontaneous recollection, 190-191

affective thinking

artistic, 216

as seeking for satisfaction, 302

definition, 278

directing memory, 197

distinguished from directed thinking, 13-17

gaps in recall of, 66-74

genesis of, 25-53

in children, 243

primitive, 300

unsteadiness of, 115-153

analysis, method of, 27-30, 136, 178, 216, 265

apperception

abnormal, 234-235, 239, 253

and affect, 218-247

and memory, 219-221

definition, 219

fore-conscious, 230

influenced by affect, 222, 233-235

leading to day-dreams, 221-235

protracted under affect, 234

relation to association, 239-240

spontaneous additions in, 220-221

wrong, 227

association

automatic, 118

by shifting of accent, 49-51

external, 43-47, 51, 52, 72

in apperception, 239-240

outer, 44, 47, 50, 247

attention

and affect, 207, 256-268

conscious, 256-268

definition, 257

distracted by affects, 131-153, 168

fore-conscious, 256-268

awareness, 279-292

definition, 291

for affects in intuition, 281

in affective thinking, 257

lacking in certain mental processes, 291

Claude’s index (A’s)

19 of 99

Main headings: summary of issues

LLM-generated indexes typically ~20-40% the number of main headings as the original index
Names and titles of works particularly underindexed

20 of 99

Recap of metatopic and related index structure

See Stauber and Towery for in-depth info
Metatopic = topic of the book
Metatopic entry ideally lets you access any page in the book

Either through a subheading or a trail of cross-references
Pyramidal: metatopic points to “supermain” entries which point to main entries as needed (Towery, 2016)

Not all books have/require this structure but many do
For many books this is a key component of a navigable index

21 of 99

ChatGPT metatopic candidates

How to Stop Worrying and Start Living:

Worry, strategies to eliminate, 6-7, 9, 16, 18-21, 37-44
Live in the present, 7-9

The Behavior of Organisms:

Animal behavior, 4, 5
Behavior, definition of, 6
No entry beginning with “organisms”

No visible aggregation of all content under metatopic

22 of 99

Perplexity.AI: Psychology of Day-Dreams

Affect

- and apperception

- and conception

- and ideation

- and intuition

- and memory

- and thinking processes

- and visualization

- role in day-dreams

- See also Emotions; Feeling processes

“Accidental metatopic”: 5 of 6 chapters here because they all included “affect” in the name. Cross-references to thematically related keywords, not for structure.

Psychology and day-dreams are the obvious metatopic candidates. There’s no psychology entry. The day-dreams entry doesn’t look like a metatopic—it’s just a collection of subheadings about daydreams, that’s another problem, but the affect entry closely reflects the table of contents, and it may be more accurate to say that affect is the major topic of the book—and in fact the affect entry includes keywords from all six chapter names (those are the ones I’ve manually bolded above). This does look like a metatopic, BUT if it were a real metatopic I would expect at least some of these to be cross-references to main headings for each of these topics. Instead, Perplexity seems to have dropped the table of contents into the metatopic. (We can’t know for sure because it didn’t include page numbers.)

There are some cross-references, but they’re to thematically related keywords—not to subtopics of the book.

Memory and affect

Apperception and affect

Ideation and affect

The issues of affective thinking

Visualization and affect

The significance of day-dreams

23 of 99

Claude: Psychology of Day-Dreams

affects

active in fore-conscious thinking, 4, 197-217

affinity for attention, 207

and apperception, 218-247, 252-253

and automatic ideation, 191

and conception, 248-257

and conscious ideation, 207

and forgetting, 140-153

and hallucination, 152

and ideation, 248-257

and memory, 183-217, 252

and perception, 241-242

and recollecting, 192-193, 197, 204, 211

and repression, 201, 292-303

and thought-formation, 250-251

and wish-fulfilment, 76

at genesis of day-dreams, 188-191

causing distraction, 133, 148-149

differentiation of, 247

directing concatenations, 112

effaced by evolution, 245

in artistic thinking, 216

in conception, 253-256

intensity of, 81, 256

interference with consciousness, 204

intuition of, 279-292

role in thinking, 259

spontaneous recollection, 190-191

Metatopic kitchen sink with full chapter ranges. No use of cross-references.

Claude did include page numbers, and they often appear to be full chapters. “affects: and apperception, 218-247”, “affects: and memory, 183-217”. In a book with a good structure, these would be cross-references to the apperception and memory entries, and those entries would then break down the page range into subheadings.

Note however that Claude doesn’t include all of the page numbers—with the exception of page 4, everything before page 76 is missing. AND page 4 is a hallucination—more when I get to hallucinations.

Like Perplexity.AI, Claude fails to use cross-references to direct readers to subtopics.

------------

I checked the first four subheadings for accuracy. (Again the reason I didn’t check them all is that there were obvious other problems). Some were reasonable if overly long; one was fabricated; another used terminology that you could see if you squinted but it wasn’t on the page, it wasn’t the author’s terminology.

“affects: active in fore-conscious thinking, 4, 197-217”: (See later for p4.) The book’s preface is on book page 7, introduction 9-10, followed by the Table of Contents. On p197, fore-conscious thinking is mentioned, but p197-209 is better described as being about affect and the link between memory and perception; it doesn’t get into fore-conscious memory again until p210. Also, this should probably be a “see also” to foreconscious thinking.

”affects: affinity for attention”: this is present and seems reasonable

“affects: and apperception: 217-247, 252-253”: 217-247 is the entire 40-page chapter on apperception and affect; 252-253 is also reasonably on apperception and affect. This subheading should be posted out to its own main heading with a ”see” from here.

“affects: and automatic ideation, 191”: a similar topic is here, but “automatic ideation” is not on the page—I can see where Claude got that conceptually since it’s about memories provoking ”a sort of brief hallucination”, but Claude needlessly made the term up. If the term “automatic ideation” were used in regards to affect elsewhere, it would make sense to group this passage with that, but that’s not what’s going on. As indexers, we use the author’s terminology whenever possible.

24 of 99

Summary: metatopic-related issues

Metatopic: Either extremely sparse page references or table of contents dumped into entry
No use of cross-references to refer out
No reasonable attempt to handle the metatopic and related structure

25 of 99

Brief recap: cross-references

Used for multiple purposes
Structural, e.g. to point to more specific topics
Vocabulary control, e.g. “Aspergers: see autism”
When the reader might be confused or want a different topic
See also references should point the reader to a topic that has additional information about the current topic
Remember, original prompt explicitly mentioned cross-references

26 of 99

How to Stop Worrying (ChatGPT)

No cross-references for either book!
For How to Stop Worrying, I prompted: “This index doesn't have any cross-references, and most of the names and concepts in the book are missing.”
Result (for the A’s):

Acceptance, power of, 19, 20, 23 (see also Resilience)

Action, importance of, 6, 7, 17, 48 (see also Success, Initiative)

Adversity, dealing with, 129-136 (see also Resilience)

Alvarez, Dr. W. C., on worry and ulcers, 26 (see also Health, Stress)

Anger, cost of, 113-122 (see also Resentment, Forgiveness)

Anxiety, overcoming, 7, 9, 16, 20, 151 (see also Fear, Stress)

27 of 99

Psychology of Day-Dreams (Perplexity.AI)

A reasonable number of cross-references, all to entries that actually exist
They’re all reciprocal, e.g.:

absurdities, see also errors in thinking
errors in thinking, see also absurdities

Connecting thematically related terms
Probably not using see also only when add’l info to be found

i.e., no hierarchical/structural entries
and no vocabulary control entries
(no page numbers, so can’t confirm)

28 of 99

Psychology of Day-Dreams (Claude)

Four cross-references to two headings
Two reasonable vocabulary control entries:

feeling. See affects
phantasies. See day-dreams

Two cross-references from small to very large entries:

fancy, 19. See also day-dreams
pleasure-pain principle, 251-252, 256, 301. See also affects

29 of 99

Summary: cross-reference issues

Cross-references were:

Absent (ChatGPT) or excessive/absurd (ChatGPT when prompted again)
Mutually associative and reciprocal, probably based on thematic similarity of main headings (Perplexity.AI)
Rare and not well-chosen (Claude)

Cross-references were not:

Used to support index structure
Used in any way an indexer would use them, except possibly for Claude’s two vocabulary control cross-references

30 of 99

Hallucinations

LLM-invented “facts” that aren’t real
Better described as bullshit
Training can help but may not be able to resolve entirely
What about in indexing?

They clearly happen!
I checked 4 subheadings for Claude and found one obvious hallucination
Hallucinations are not the biggest problem with these indexes
If LLMs produced good indexes, hallucinations might be a bigger deal

31 of 99

LLM-generated indexes: Summary of Issues

Underindexing

~20-40% the number of main headings (access points) of the original
Names and works especially underindexed

Poor/nonexistent index structure

No effective handling of the metatopic or of related structure
Incomplete (at best) understanding of cross-references

Hallucinations of unknown prevalence, but still a concern

32 of 99

Why are LLMs bad at indexing?

33 of 99

What I’ll cover in this section

The available corpus of indexes
Indexing education’s (relatively) nonpublic nature
LLMs’ nature as predictive text generators
More speculative: problems using context appropriately

34 of 99

The available corpus of indexes

LLMs learn from available bodies of work
Many indexes are bad!

Not produced by professional indexers
Indexer unskilled or in a rush
Indexing best practices have changed some over time

LLMs use the entire corpus, not just good indexes

35 of 99

Indexing education

LLMs learn from available bodies of work
Indexing training and education are typically gated

Paid courses
Paper books
Webinars and conferences
Mailing lists
Personal experience

Best practices not free/public like, say, software development
LLMs don’t have access to much of what indexers know

36 of 99

LLMs’ nature

LLMs are predictive text generators

“Spicy autocomplete”

They are not indexing a book, they are autocompleting an index

they are predicting,
based on statistical frequency of words in their corpus,
what a response to your prompt would likely look like
they are not going through the book and gathering all the bits of indexable material

They generate text, then generate more text, linearly
They cannot edit previously produced text
It’s not surprising they don’t produce good indexes
It’s not what they’re designed for!

37 of 99

Indexes and context

Indexers are “context artists”

We flexibly consider many contexts
And make appropriate decisions about placement, phrasing, structure

Individual entries’ contexts include:

Different likely readers
Other chapters/sections
The index structure (metatopic, supermains, etc)
Similar index entries
Spatially close index entries

That is a lot of information to consider!

38 of 99

LLMs have context problems

Absence/paucity of index structure (metatopic, cross-references)
Can’t consider information outside of its “context window”

(Although not a problem for these books)

In casual use, LLMs can sometimes fail to use information earlier in the conversation
Anecdotes from a tech writer:

LLMs sometimes “lose” project context they technically know about
Sometimes randomly apply spontaneous edits that go against the provided style guide

LLMs also appear to have trouble with applying relevant information more generally, not just for indexes. I exchanged emails with Alan Bowman, a new indexer who has a day job as a tech writer and who uses AI in his tech writing work. He’s had issues with it “losing” project context it has already been told about, and sometimes with it applying spontaneous undesired edits that go against the style guide, like changing random tables into bulleted lists. Those are cases where it has the context but fails to apply it correctly or even goes against it.

An open question for me is whether this is something LLMs *don’t* do well or something they *can’t* do well. I tried to look into this and couldn’t find any trustworthy relevant information. I also tried asking Perplexity.AI and it gave me some very plausible answers, each with a cited source that turned out to be largely unrelated to the point it was supposedly supporting. I tried looking up relevant terms from its reply and got nothing. I think this is an issue that’s either largely undiscussed or is not being discussed under terminology that lets me find it.

39 of 99

Why don’t LLMs use context “right”?

Lack of indexing training almost certainly contributes
Speculation: context artistry is computationally expensive
Indexers narrow the range of potential computations

We benefit from indexing-specific knowledge
And from extensive evolutionary and cultural pressures toward correctly anticipating other humans

LLMs are not shaped by those pressures like we are

40 of 99

Summary: Why LLMs make bad indexes

LLMs are not trained to index well

Trained on a corpus containing many bad indexes
Lack access to indexers’ knowledge, training, and resources

LLMs are not designed for indexing

They are autocompleting based on statistical likelihood

LLMs do not use context like an indexer would

And indexes are all about context

41 of 99

Could someone train an LLM to make a better index?

42 of 99

Training

Similar to how LLMs are trained for other tasks
Give it explicitly labeled examples of good and bad indexes, good and bad index entries, etc.
Or feed it more elaborate indexing instructions
Or give it lots of feedback on the indexes it generates
I am personally not interested in doing this

43 of 99

Could LLMs be trained to index well?

Formatting/alphabetization issues easy to address

Can do that without AI

Could we address underindexing?

Probably at least partially

What about index structure?

Unknown

44 of 99

Could we work around the structure issue?

Telling an LLM how to break down a problem = common technique for improving performance

Could we give an LLM a particular structure and tell the LLM to fill it in?

“Index editor’s paradox”: how do we know an index has the correct structure?

Typically by indexing the book
And then there’s no need for an LLM

Books vary in how predictable their structures are

Indexers vary in how good they are at predicting structure
A skilled indexer working on predictably-structured material might find a trained LLM to be helpful
Or might not

45 of 99

Who would train/fund training an LLM?

Publishers: don’t have indexing knowledge, don’t want to spend money
Indexers: don’t have the technical knowledge or desire to replace ourselves
Indexing software creators: don’t want to eliminate target market
Other software developers: lack indexing knowledge, few financial incentives for disrupting a small field

46 of 99

LLM-generated indexes: Summary

LLM-generated indexes aren’t good enough

May not ever be good enough
Autocomplete is not a good strategy for making indexes

Training LLMs to create indexes:

Unclear who would do it
Unknown if it could adequately address underindexing/structural issues
There may still be hallucinations, which are notoriously hard to address

Conclusion: LLMs cannot do our work

Definitely not right now
Possibly not in the future

47 of 99

Bonus slide: support tasks

48 of 99

What about support tasks?

Izzard (2024):

Text summary
Keyword extraction
Proper-name extraction
Research tasks (asking the LLM a question about the text)

Izzard concluded LLMs are not reliable enough at this time
My take as well

49 of 99

Our actual problem with LLMs

And what we can do about it

50 of 99

The invisible indexer

People already often believe indexing is automated
Now LLMs can produce plausible-looking “indexes”
We know their problems because I compared them to human-generated indexes
Authors/editors:

Aren’t necessarily doing a side-by-side comparison
Not necessarily equipped to evaluate indexes on their own

51 of 99

Our role as indexers

We are the experts on indexing
We should:

Evaluate LLM-generated ”indexes”
Issue public recommendations to authors/publishers
And also inform prospective indexers

52 of 99

Recommendations

For indexing societies
For individual indexers
For indexing educators

53 of 99

Recommendations: indexing societies

Publish statements on LLMs’ shortcomings

Clear, easy-to-understand, phrased in author/publisher language
Focused on impacts on readers/authors/books/profits
Include examples
Put this info on pages aimed at prospective indexers too

Consider how to respond if indexers, non-indexers, or people posing as indexers produce bad LLM indexes
Reach out to other fields to see how they’re handling AI
Consider assessing tech needs and publishing a wishlist

Tech development should be driven by assessed needs, not newness of technology

54 of 99

Recommendations: individual indexers

Have an AI policy on your website/in your client materials, including (for example):

That you don’t use generative AI to index
That LLMs fail to ensure readers find needed info (underindexing, lack of structure/cross-references for finding subtopics and related topics)
That LLMs can also insert false information (hallucinations)
If your indexing society issues a statement, consider linking to it

Know how your skills differ from LLMs’ skills

Develop a brief pitch for if someone asks
Hit the above points

Start conversations about AI with authors/publishers, if appropriate to situation/relationship

55 of 99

Recommendations: indexing educators

Teachers, writers/bloggers, mentors, etc.
Talk about LLMs with new and prospective indexers
Emphasize where LLMs fall short

Underindexing
Index structure
Hallucinations

Consider keeping indexing content gated

Not necessarily paid, but gated
E.g., behind an email address instead of on the open web

56 of 99

Summary

57 of 99

We should tell authors/editors/prospective indexers about LLMs’ shortcomings

Major shortcomings:

Severe underindexing: ~20-40% the access points of a human-generated index
Absent/near-absent index structure (metatopic and cross-references), preventing effective navigation to subtopics and related topics
Hallucinations an issue of unknown severity

Ways to tell people:

Indexing society formal statements on LLMs
Indexers can post AI policies on their sites / in client materials and prepare brief pitches for interpersonal situations
Indexing educators can teach explicitly

58 of 99

Questions/suggestions?

Relevant to indexing/publishing and AI

elizabeth.bartmess@gmail.com

59 of 99

Bonus slides I won’t have time to cover

60 of 99

How else might LLMs affect our work?

Areas to keep an eye on

61 of 99

Three potential ways

By changing how readers access information
By changing what is published
By changing publishing

62 of 99

By changing how readers access information

Less reading of full books, more LLM-generated summaries
Unclear if this will reduce the use of indexes

This could result in increased index use
It may affect different types of books differently

If readers access books through LLMs, LLMs are a potential index audience

Unclear if or how this would affect our work

63 of 99

By changing what is published

LLM-generated or LLM-influenced material may have structural differences that affect its indexability

64 of 99

By changing publishing

Editors may delegate work to LLMs instead of people
Potential reduction/elimination of entry-level jobs may create a future leadership gap
This could result in a decline in book quality

65 of 99

Where you can use LLMs

(If you want to)

66 of 99

If you don’t want to use LLMs

You can turn it off

In your laptop and phone OS
In MS Word
In your search engine settings

You can skip the rest of these slides

67 of 99

Using LLMs

Use LLMs where you can confirm their accuracy

Ask for help with common tasks in software programs, trouble-shooting, etc.

Writing macros
Writing regular expressions (search patterns / text replacement)

68 of 99

Macros and regular expressions

69 of 99

Handouts

Handouts on conference webpage
They cover installing and setting up a macro with a regular expression
We won’t work through those
But you can download them and work through them yourself
And ask me if you have questions

70 of 99

Macros

A macro is a mini-program anyone can write
Bundle tasks and run with a keyboard shortcut like Alt-Shift-C
Example: copy term and page # from PDF to indexing software

Many programs have macro capacities

Record and/or specify sets of actions to perform
SKY, Macrex, Cindex for Windows, Word, Acrobat, others
Third-party macro software

71 of 99

Regular expressions (“regexes”)

Formal ways of representing complex search patterns

“a sequence of characters that specifies a match pattern in text” –Wikipedia
Example: ^(.*)\s(\w+)$

Regexes do text transformations

Change text to other text based on a set of rules
Invert a name
Change Dickens’s Great Expectations to Great Expectations (Dickens)
Change cats [dp felines] → felines

72 of 99

Using regular expressions

In indexing software:

Cindex—“search patterns”—can use regexes directly in find/replace
MACREX—partial implementation of regex
SKY—not standard regular expressions but similar in nature

In third-party software

Windows:

AutoHotKey scripting language
Macro Express 6 can call external scripts

Mac: in Keyboard Maestro

73 of 99

Macros versus regular expressions

A regular expression is just a text replacement pattern
You apply a regular expression using a program

For example, a search-and-replace action in Cindex
Or in a macro, for example to invert a name on your clipboard

74 of 99

Introducing Regular Expressions

Capture groups

Allow you to parse your text into chunks
Where each chunk matches a specific pattern
These use parentheses ()

Replacements

What you do with the chunks
A separate step from the regex
Allow you to reorder one or more chunks, add characters, etc.

75 of 99

Regexes: simple name inversion

Regex: ^(.*)\s(\w+)$
The ^ indicates the start of a string and the $ indicates the end
There are two capture groups:

(.*) puts “a bunch of text of any kind” into a capture group
(\w+) puts a word of one or more characters into a capture group
The \s between them matches a space (actually, any whitespace character)

Replacement: $2, $1

Takes the second capture group
Adds a comma and space
Puts the first capture group after it

76 of 99

Macro for this talk

Files are online

Windows: AutoHotKey (both version 1.1 and version 2; free download)
Mac: Keyboard Maestro (paid or 30-day free trial)

Use accompanying installation/setup handout
What the macro does:

Retrieves any text on the clipboard
Applies a regular expression to it
Puts the changed text back on the clipboard

77 of 99

Installation/setup instructions

Walks you through:

Installing macro software
Opening up the macro file for this talk
Testing the macro
Making a copy of the macro
Replacing it with a new regular expression

78 of 99

Writing regexes with LLMs

79 of 99

General advice for using LLMs

Think of an LLM as the smartest intern ever on their very first day
They are brilliant; they know nothing
You must provide clear instructions and context

80 of 99

Asking an LLM to write you a regex

Go to an LLM site (for example chatgpt.com)
For light use, you may not need an account (or a paid account)
Type in the prompt box: “Write me a regular expression that…” and explain what you want it to do.
This is a “prompt”

81 of 99

Prompt writing tips

Be very specific and clear.
State the steps you want the regex to perform in the order you want to perform them.
Give it examples of inputs and outputs.
(optionally) Give the AI (and yourself) a role/persona and ask it to explain the results accordingly.
If it goes down a wrong track, clear the conversation and start over
Some back-and-forth/refinement is normal

You can ask it to clarify things, rewrite things, etc.

82 of 99

Pitfalls of LLM answers

Not always accurate
Can be very confident and plausible and completely wrong
Always verify by testing
If doing something that will affect your index, test on a copy
The more you know about regexes, the better you can fix problems
Sometimes a “build in small pieces” approach can help

83 of 99

regex101.com

Regex in top box
Test string(s) in bottom box
Explanations and capture groups shown on the right
Does not have complete overlap with ChatGPT’s implementation

84 of 99

85 of 99

Example: Prompt

Prompt: “Can you write me a regex that takes a string and inverts it at the last word, for example turning ”John Allan Doe" into ”Doe, John Allan"?”
Sometimes it will give you Python code
It really likes Python
Followup prompt: “Just the regex, not the Python code.”

86 of 99

Example:�ChatGPT’s response

87 of 99

Example: regex101.com

88 of 99

Troubleshooting

Give the LLM any errors or bad results and ask it to explain why the result differed from what you expected
Clear the conversation, copy-paste the regex, ask it to check for issues (I do this sometimes even before I test it)
Sometimes what you want to do will be too complex for a regex
You will need a more complicated solution

Like a macro
Or multiple regexes

89 of 99

Writing macros/scripts with AI

90 of 99

Using LLM-generated macros/scripts

You will need to know how to create and edit macros/scripts in the software you’re using
Windows

AutoHotKey: is a scripting language, meaning entirely text
Macro Express: has an underlying script view

Mac

Keyboard Maestro has no “script” view
Get steps from LLM, manually implement them

91 of 99

Macro Express’s script view

In menu bar, with a macro open, select View → Direct Editor
Your macro’s actions will display as a script
You can copy-paste a script into here
Select View → Direct Editor again to return to normal editor

92 of 99

Prompt writing tips (again)

Be very specific and clear.
State the steps you want the macro/script to perform in the order you want to perform them.
Give it examples of inputs and outputs.
(optionally) Give the AI (and yourself) a role/persona and ask it to explain the results accordingly.
If it goes down a wrong track, clear the conversation and start over
Some back-and-forth/refinement is normal

You can ask it to clarify things, rewrite things, etc.

93 of 99

Additional prompt generation tip for macros/scripts

Tell it what language (and version if applicable) you want it to use.

“for Macro Express 6 Pro”, “for Keyboard Maestro”, “for AutoHotKey v1”

LLMs tend to give AutoHotKey scripts in 1.1 even if you ask for v2
You may need to ask the LLM to correct its script

94 of 99

Example

The following slides show you a macro in AutoHotKey 1.1 (Windows) and Keyboard Maestro (Mac)
The macro takes the clipboard contents, inverts them at the last word, and replaces the original clipboard contents with the inverted version

95 of 99

Windows: AutoHotKey 1.1

Set hotkey

Update English-language description of hotkey

Regex

Replacement

96 of 99

Mac: �Keyboard �Maestro

97 of 99

Mac: Keyboard Maestro

Check this

Set hotkey

98 of 99

Mac: Keyboard Maestro

Regex

Replacement

99 of 99

Questions?