1 of 17

How to Develop a

Browsertrix Behavior

Ilya Kreymer, Tessa Walsh

IIPC WAC 2025 Workshop

April 10, 2025

2 of 17

Agenda

  • Introducing Browsertrix
  • Browsertrix Behaviors: what are they? how do they work? when are they a good solution?
    • Built-in behaviors
    • Site-specific behaviors
  • First approach: Autoclick behavior
  • Second approach: *NEW* Flow recorder behavior
  • Third approach: Custom behavior

3 of 17

Browsertrix: levels of familiarity in room?

4 of 17

Browsertrix Behaviors

  • Logic injected into the browser to perform certain operations on a page, such as scrolling, fetching additional URLs, or performing customized actions on particular sites
  • Built-in behaviors
    • Background behaviors run on every page (autoplay, autofetch, autoclick)
    • Autoscroll will run on every page that doesn’t have a site-specific behavior
  • Site-specific behaviors
    • Run on pages that match specific criteria
    • Some are included with the crawler: e.g. Facebook, Twitter, TikTok
    • Possible to extend with custom behaviors

5 of 17

So you want to archive a site that requires dynamic user actions on the page…

6 of 17

First approach: Autoclick

  • Easiest solution to have browser perform specific actions on page
  • Click on all elements that match given selector without navigating off page

7 of 17

Autoclick demo!

8 of 17

Second Approach: DevTools Flow Recorder!

  • Brand new approach (added in this release)
  • Uses DevTools Recorder Tab, built-in to Chrome since 2021!
  • Designed for measuring performance and QA testing by Google: https://developer.chrome.com/docs/devtools/recorder
  • Can be used to create a simple user-defined behavior!
    • More complex than Autoclick, but more flexible!
    • Probably not complex enough for social media

9 of 17

Second Approach: DevTools Flow Recorder!

10 of 17

Second Approach: DevTools Flow Recorder!

Steps:

    • Open dev tools recorder tab and start recording
    • Perform actions on page
    • Finish recording
    • Copy or download JSON and put it on public Git repo or URL
    • Point Browsertrix to JSON using new Custom Behavior support
    • Watch behavior run and inspect behavior logs in real time!

11 of 17

Flow recorder examples/demo!

12 of 17

Third Approach: Custom Behavior

  • Most effortful option, but also the most flexible and powerful
  • Define a JavaScript class with certain required methods and static properties, take advantage of built-in utilities
  • Documentation: https://crawler.docs.browsertrix.com/user-guide/behaviors/
  • Steps:
    • Develop custom behavior
    • Put custom behavior on public Git repository or URL
    • Point Browsertrix to custom behavior
    • Watch behavior run and inspect behavior logs in real time!

13 of 17

Custom Behaviors: Things to Be Aware Of

  • Running behaviors
    • First behavior whose isMatch() returns true will run on any given page
    • One workflow/crawl can use many custom behaviors that run on different pages
    • May need to increase behavior timeout in Browsertrix for long-running behaviors
  • Developing behaviors
    • Logging - log() vs. Lib.getState()
    • Iframes
    • Maintenance: behaviors may need tweaking as sites change over time

14 of 17

Custom behavior examples/demo!

15 of 17

In an ideal world, how would you like to create behaviors for the crawler?

�Can we make this easier?

16 of 17

Looking Forward

  • We’d love to see a community develop around behaviors and encourage use and re-use of behaviors across organizations
  • Encouraging use of source control is a good first step in that direction
  • How would you like to see this aspect of Browsertrix evolve?

17 of 17

Questions?

Learn more or sign up for a free trial: https://webrecorder.net/browsertrix�Get in touch: info@webrecorder.net

Take some stickers!