1 of 19

The innards of BAILII:�opening internal interfaces

Roger Burton West, BAILII

2 of 19

Objectives

  • “Ignorance of the law is no excuse”
  • Free public access to primary materials
  • Not competing with commercial services
  • Public-domain material
  • No large-scale added value

3 of 19

Deep linking

  • Part of original vision for WWW
  • Actively discouraged by many sites
  • Users must run own searches
  • BAILII encourages direct links

4 of 19

Stable URLs

  • BAILII maintains fixed URLs
  • Some people are still using URLs from 2002

5 of 19

Coverage

  • Not comprehensive, but good post-1998
  • Higher courts only
  • Full-text
  • Open formats
  • OpenLAW: significant historical cases

6 of 19

Searching

  • SINO search engine
  • Boolean capability
  • Not domain-specific
  • OpenLAW indices

7 of 19

Automated citation markup

  • HTML version only
  • Markup on demand
  • Regexp and database
  • Not 100% accurate

8 of 19

Automated citation markup

  • HTML version only
  • Markup on demand
  • Regexp and database
  • Not 100% accurate

/^(.*?)

(\[?\d+\]?

(?:\s+

(?:

[-A-Z_\/\d]*\d[-A-Z_\/\d]*|

\d+\([A-Z][_\sA-Za-z]+\)|

\(?[A-Z][A-Za-z']+\)?|

[A-Z.]+

)

){2,}

)

(.*)$/sx

9 of 19

find_citations.cgi

  • Same code as for in-line citation markup
  • Marks up submitted HTML documents
  • Usable from web form
  • Also usable programmatically:�curl -o out.html -F infile=@in.html http://www.bailii.org/cgi-bin/find_citations.cgi
  • Or from any HTTP POST

10 of 19

check_citation.cgi

  • Given citation, lists URL
  • Also lists alternative citations
  • If no match as entered, checks stripped form

11 of 19

Citation database

  • VNCs loaded automatically
  • Other citations entered manually
  • Forward maps: citation to URL
  • Reverse maps: URL to citation

12 of 19

Paragraph tags

  • Wherever official paragraph numbers exist
  • <A NAME=''para1''>

13 of 19

Stored searches

  • IAT lists by country
  • Can be generated for any key words
  • Full search functionality is available

14 of 19

Users

  • Academics (especially students)
  • Practising lawyers
  • Paralegal and equivalent
  • Laymen

15 of 19

Small but fast

  • Two permanent staff
  • Rapid uploading of new material
  • Largely automated (custom Perl)
  • All free software

16 of 19

Cross-LII linking

  • Protocol agreed in November 2004
  • BAILII: 128,000
  • CanLII: 2,145
  • other LIIs: <100

17 of 19

Dispersed servers

  • BAILII hardware is fully redundant
  • But dependent on a single network connection
  • Dispersed servers can serve as local caches
  • ...and take load when main servers are unavailable
  • DNS-based load balancer under development

18 of 19

External indexing

  • Should search engines index BAILII?
  • Yes: generic resource
  • No: privacy

19 of 19

What next?

  • More data
  • Other web-based services?
  • contact bailii.org