Access Live Note-Taking, Day 1

Access Live Note-Taking, Day 1

Opening Keynote: From Access to Interactivity

David Binkley Memorial Lecture:

Embodied Histories & The Weight of Data

Big Data in Libraries: Has Open Source’s Time Arrived?

Hackfest Report 1

Lightning Talks

Tātou tātou: Caring and sharing in a Free and Open source Project

Digital Preservation from Coast to Coast

Lightning talks part deux

Opening Keynote: From Access to Interactivity

Jon Beasley-Murray, Assistant Professor, University of British Columbia

- suggest ways in which we rethink access to information

- rethink the ways we think about public institutions like the university and the library, and their relationship with society

Borges story -

- 1930s Buenos Aires library - 210 pesos/month

- but it was easy work

- thought the library was the mirror of the world

- the library had everything - you could spend your life there without ever exhausting what it had to offer

- Aleph - prized posession of poet, secret, shows it to narrator

        - a private hoard rather than a public good

- the university and library are always in peril if they are treated as private


- closer than ever to the dream of the Aleph

- GBooks = library of babel?

- the very notion of keeping it private makes less and less sense with the arrival of 2.0

- value increases the more the network is extended

- the public is actively construction the universal (?)

- expeanded access to the common allow “ordinary” people contribute to building the common wealth of all

        - which means increasingly outrageous attempts to privatize the commons

        - primitive accumulation: enclosure of the commons (aka accumulation by dispossession)

- capital is claiming new terrain with the aim of making it profitable

        - universities and libraries are on the front line

- question of access has been recast

        - we are more than the guardians of legacy of the commons

- the public is generating hte commons (the commons is active, not a result of hte past)

- Access Copyright - sukkit

- everyone devoted to public knowledge, has to decide which side they are on

- mutually enabling and mutually limiting

        - youtube wouldn’t make any money without fostering creativity in the commons

        - but the more it makes money, the more it limits

- private interest can lead to teh creation of the common give, but it also threatens expansion

- increasingly, creativity depends upon commonality

        - no longer feasible to keep R&D in-house

        ex. war against terror - had info been allowed to flow freely btw state agencies...

- cooperation, not competition, is most productive

- opportunity for libraries

        - our raison d’être is neither profit nor productivity

        - we are to promote the commons

- private interests are taking a hold of universities

- "The privatization of a university is the death of a university."

- economic parasitism

- why do we used closed systems like WebCT?

- no reason why every librarian shouldn’t be involved in a massive project of collaboration to expand the commons

- Arnold’s definition of culture

- libraries are responsible for the archive

        - digitization strips out contextual content (ads, etc)

        - important to know exactly where stories are placed in newspapers (below the fold, etc)

- the reading room (example @ BL)

        - some libraries don’t have a dedicated space for reading (REALLY?)

- why level fines on faculty

- faculty members as off-site storage        

- worse fate that threatens universities - irrelevance

        - if we no longer take an acrive part, participate in the public sphere, we will be lost

- not access, interactivity

- universities should not pander to society

        - it’s role is critique, not service

        - society without room for critique is not worth living in

- entrepreneurship should not trump everything

- DIY U book

        - edupunks + u of phoenix

        - unashamedly utilitarian view of university

                - no reflection, research, or critique

        - students can and should construct their own learning plan

- stop selling tech for customer satisfaction

        - return to an ethic of difficulty, not ease

                - studying, elarning, research, thinking - all meant to be difficult

                - beware any panacea that tells you otherwise

- the price of education is being prepared to think, rethink, and to question their most treasured beliefs

        - likewise we should be prepared to be more difficult/critical/less compliant

- Borges didn’t back down - his stoiries insist that you think

- they say 2 things

- producitivy of the commons: Borges believed in borrowing/stealing/building off of

- argued this is what we have always done

- always treason to the thing copied, but that is what we do

- meaning is necessarily interactive

- readers can never be passive consumers

- readers help produce the meanings of a text

- interactivity is not easy, it shoudl not be easy

- it is something a university should be both practising and teaching


what about making discovery easier for students?

- the goal is not to have knowledge be an IV drip

be more ornery. some libraries still use tools that were created 50-100 years ago. don’t focus on efficiency - frees up our time to think about hte work that we’re doing now - which may be more valuable in 50 years. think about the fact that we are building an infrastructure and that folks will have to build on this inheritance

- once you refuse to submit to just-in-time

there is a notion for the usefulness of knowledge shared by librarians. true knowledge vs universal knowledge. what do you think is the library’s role in terms of providing true/authentic knowledge.

- knowledge and meaning are products that we construct

- resist the notion of “usefulness” of knowledge

        - but this implies that the ultimate goal is not in question - they just want the tool to get to where they want to go

        - shouldn’t the institution be asking people to rethink their goals?

        - difficult to be completely useless

difficulty = complexity?

- Borges story - Two kings Two labyrinths (in Fictions)

- one labyrinth is incredibly complex, the other the desert, both are difficult though only one is complex

defending the commons = occupy movement. can you reformulate it to sell to the libertarian right?

- incumbent on the university to make a case for the role of the university that is for critique in society

scholarly publishing is broken. tenure & promotion is broken. how do we convince faculty to do more OA fun things?

- librarians are the most progessive on campus. it’s tricky. piece by piece.

- don’t make any apologies for faculty

David Binkley Memorial Lecture:

Embodied Histories & The Weight of Data

Jer Thorp, Data Artist in Residence, The New York Times


venn diagram - science, design, art

- interested in areas that exist between these disciplines

talk about history in a few contexts

def: reassembly of a narrative from events that have happened in the past


protip: get your mom to give you old family pix for presos

arrival of the family computer

        - impossible to imagine such a transformative change

        - one day, voila!

apple were so punk rock the rainbow in the apple wasn’t even in the right order

Mac Classic II!

let’s not talk about Steve Jobs, let’s talk about Bill Atkinson (early Apple team member)

        - GUI was his baby (as well as drop down menus, selection lasso...)

        - HyperCard!


        - make applications on your own

        - hypertext stacks to teach students

        - if Monks had Macs -

        - Brian Thomas quote - this served no commercial purpose

                - CODEPUNK

why would you deliver the worlds greatest tool to someone without giving them the tools to build on it?

        - how many people use computers but can’t program

Software Artist

- started becoming more interested in the actual system

        - ex The Colour Economy -

- how can we do things using other data

        - bring in outside info as opposed to generative art

- NYT API opened up

        - remember when “million” was a ginormous word? not so much anymore

        - graphed occurrences of million/billion/trillion in NYT data (not such a hot graph, but spurred him on to do more

        - communism, cycles in iraq/iran

- not really interested in it as infographic, more aesthetics of data

- how can i engage with this archive better?

        - interested in ways that folks could explore news

                - similarity map - (central image, with similar images building off each other)

                - interesting concepts in how people in the future would view these

- Mark Lombardi -

        - interested in financial corruption

        - would sketch out relationships between scandals

        - used by the US govt post-911

        - these maps were way more successful than his paintings (which makes you wonder what is art? what is the intent of the artist?)

        - used index cards (14,000+) to assemble these maps - now we can do this using software

Jer’s work

- moved from graphs to systems

- how is life being transformed by the data we leave behind

- Just Landed -

        - program that scraped twitter looking for people who said they just landed somewhere, and plotted it vs where they moved from

- Good Morning

        - timezone adjusted twitter greetings

- the internet helsp you build things that may not have an initial purpose, but others can find uses for it

        - ex. NASA Kepler project - locating other possible planets

        - tough to read the paper about it, so Jer built a visualization -

New York Times

- wanted to work with Mark Hansen (@cocteau)

        - stats prof and new media artist


- Cascade - piece of software that looks at events on the internet

        - looking at how they are related to each other

        - event types

                - browses of NYT

                - encode

                - tweets

                - decode

                - browse

        - how do we map how folks read things on the internet

        - 6500 pieces of content every month

                - large data terrain, so let’s build exploratory tools (different views - depending what you want to see like degrees of separation)

- cascade/radar/tree/story views

But Will it Make You HAppy? -

- a good exploratory tool also helps you find the small pictures

        - we talk about getting data in and processing it, but we don’t do a very good job at building tools to help people explore the data

        - purpose-built small tools are ideal for this


- check out Alex Beim @tangibleint

        - tasked with evolving a playground for Richmond, BC

                - needs to be accessible for wheelchairs

- lots of convos with Dan Shiffman @shiffman

        - recommended Jer to Jake Barton @jake_barton of Local Projects

                - involved in constructing histories at the local level (and literal construction)

                - names on 9/11 memorial have meaningful adjacencies

                - started out as a math problem - 2900 names, 1400 requested adjacencies - one name had 72 adjacencies - optimization problem (need the best solution, not knowing what the right one is)

                - became apparent this wasn’t just a math problem - became a design problem (these aren’t blocks, names are fluid in size)

                - which then became a human problem...

                        - built a software tool to allow architects to move names around to make better visual sense as well

                        - no other software could do this - needed a custom tool

- embodied histories

        - our lives are being stored, in many cases in our pockets (smartphones!)

                - 150 people in the room have this data, but the only folks that can access it is apple

        - check out

                - which lets you see the data stored by your iphone

                - gives this data to researchers

                - SO COOL

                - experience of reliving a year of your life is much more rewarding than you would think

- data is tethered to things in the real world... these are fragments of our lives.

News Memory Map

plugin a meaningful date for you and ta timeline appears with events from the NYT

all of this awesome was built with this tool - Processing - in conjunction with MIT labs


Big Data in Libraries: Has Open Source’s Time Arrived?

MJ Suhonos, Systems Librarian/Software Engineer, Artefactual Systems

Peter Van Garderen, President/Systems Archivist, Artefactual Systems

Peter Van Garderen - Artefactual

- gives away free beer (GPL licensed code)

- who are we in the face of google ebooks itunes flickr etc...

- we’re space

- we’re trusted digital repositories

- we’re portals

- we’re code (see Eric Hellman’s talk at Code4Lib 2011)

- we’re context (as archivists)

- read: Hugh Taylor “The Archivist, the Letter, and the Spirit” Archivaria 43 1997 p6 -

- big picture

        - think careers, not projects


        - pass down wisdom for the benefit of all peoples

        - we’re the 99% - we should be in charge of hte public record, social networks, big data

- let’s not fight against the man, let’s fight for freedom

        - check out Free Software Foundation, Creative Commons, Electronic Frontier Foundation

- built with LAMP stack

        - Symphony (Propel & ZSL index) + Qubit + ICAAtoM


- big data

        - in 1980 that was 20GB

        - concept is very relative

        - Moore’s law

def: datasets that grow so large that they become difficult to work with using relational databases and within a tolerable elapsed time

big data is big

- facebook - 140 billion photos, human genome - 3 billion pairs, google - 50 billion web pages, worldcat - 1.5 billion item records

- europeana - 20 million items, LoC - 1.9 million items, Canadiana - 1 million items, LAC - 3.5 million items

big data is complicated

- MARC records... still causing us grief

        - hierarchical structure

- then we moved to tables

- nb: tables are not hierarchical, the relations in our data is not the same as the tables in a database

- maybe we are using the wrong tools for our data?

        - square peg round hole

- possible new tools that manage data in non-relational way

        - redis

        - mongoDB

        - CouchDB

        - Cassandra

Scalability test

- using LAC data...

- ICAAtoM + Propel + Solr + elasticsearch (document-oriented)

- write speed test

        - object-relational tools didn’t do very well (solr/qubit)

- read speed test

        - Solr rocked

- write memory

        - object-relational were memory hungry

- read memory

        - same deal

- object-document approach is significantly faster in every way


        - 4-10x faster

        - 50-90% less memory

- relational dbases scale well

        - if your data is not hierarchical

- Solr scales well

        - if you have infinite RAM

- beware the dogma of SQL

        - NOSQL is a viable option

- think sideways

        - scale out

- are the new tools better than our old tools?

- the cloud is a lie

        - they are handy if you don’t have dedicated resources but don’t believe the hype

- big data is less about size and more about freedom

        - open source tools + distributed design = new opportunies


What should universities do with their data?

- open up the APIs for folks like Jer

- stop building all Solr all the time

- can you download the entire catalogue as csv? why not?

- timeline/chronology built in to digital collections?

What are the skills that we need to pick up? Tutorials? Books (like Lucene in Action)? How do we make this happen?

- give your tech folks time to tinker

- when buying tech ask for the rights to the documentation and any tutorials

- we have to work together to increase our skillsets to be better at working with digital info

- not all librarians need to be geeks, but all need to be familiar with this stuff

- LIS schools need a dedicated IT track

Less about scaling the storage - it’s about scaling the processing. (Sure storage costs money, but the processes need fixing first.)

Hackfest Report 1

Peter Binkley - UAlberta

- planned to build a Linked Open Data set

- used Silk workbench

        - point it at SPARQL endpoints and give it some matching rules and it’ll pull out hte matches

- pointed it at a UofA set and at dpbedia

        - give us all the links between people in each collection (FOAF names)

        - can use regular expressions to fix things up (ex. lastname, firstname to firstname, lastname)

        - iterative process available

        - output OWL same-as statements

Cameron Metcalf - UOttawa

- SCRUM agile management

- didn’t adopt the whole church of SCRUM - just took what was most valuable

        - focused on sprints

        - looooong meetings to build the sprints

- when filling out the cards/tasks with estimated time and date, realize this is a commitment

        - can take stress off in terms of next steps

- supposed to have buffer time between sprints

- quick morning stand-up meetings looking at the board were crucial

- read Michelle Frisque’s LITA forum talk -

Rich McCue - UVictoria Law Libraries

- social media in education rant

- decided to use social media to overcome issues with useless instructor

- every night after reading the lecture studetns would go to youtube and look up terms - would share the resources in a facebook group

- the group did better than the others in the class

- “didn’t need the teacher after all”?

        - poor teacher = more engaged students?

- we need to look for better ways to engage students

- we aren’t in the lecture business


Lightning Talks

Cynthia Ng, SLAIS

help to build the visualizations like Jer’s

lots of quickstart guides

make dynamic graphs easily

drupal and wordpress plugins

Peter Tyrrell, Andornot

Quick and dirty guide to monter PDFs

Solr on Ubuntu

and our first mention of DnD

ex. look for dragon in monster manual


- slices & dices PDFs

- split the PDFs into single pages

fed it into Solr

now a browsable fun list

Tātou tātou: Caring and sharing in a Free and Open source Project

Chris Cormack, Senior Developer, Catalyst IT

[he’s speaking Maori at the moment, so i’m not giving you any notes right now - all i hear is the Haka KA MATE KA MATE KA ORA KA ORA]

talking about Koha in terms of free software, community, collaboration, friendship


0. fredom to run the program for any purpose

1. freedom to study how the program works and change it to make t do what you wish

2. freedom to redistribute copies so you can help your neighbour

3.  freedom to distribute copies of your modified version to others

why free instead of open source?

- end goal of free software is freedom, not total cost of ownerships, etc

- empasis on freedom, not development model

- read this - Benjamin Mako Hill - “When Free Software Isn’t Better”

what does this mean for the library?

- freedom to check out what you want

- freedom to weed

- fredom to expand your collection

- fredom to share

are we sticking with a shitty situation because it’s easier than change?

are we at the point where we like complaining about things - assuming we have no ability to change?


- it’s a pile of code and a big community

- example of caring and sharing

        - during Queensland Floods a library lost their entire collection - needed a list of titles for insurance claims

        - within 15mins of requesting help to the list, had 17 offers of help, script written in 35minutes

- you are buying into a community, not a piece of software

- Trappist monk does security!

- lots of African libraries are forming consortia using Koha

- sometimes things fork... (LibLime and WALDO)

        - can cause confusion


- mature product

- bunch of developers

- new release is the work of 6 countries

- want to contribute? talk to Chris


how to you manage new code?

The release manager (Chris) decides if it’s a feature release (held) or a bug release. BugZilla to manage it. Patches attached to bug report. Third person (not bug submitter or Chris) signs off on the patch. Git for versioning. Only 7 patches have been rejected outright.

How come there are more women in Koha?

Development field dominated by men, library field dominated by women. Because they work together, there is a better balance

Excellent story about electric fences, rural modems, and a boss that says “it’s just a big database, how hard could it be?” YOU HAD TO BE HERE.

Digital Preservation from Coast to Coast

Geoff Harder, Digital Initiatives Coordinator, University of Alberta

Mark Jordan, Head of Library Systems, Simon Fraser University

Slavko Manojlovich, Associate University Librarian Information Technology, Memorial University of Newfoundland

Bronwen Sprout, Digital Initiatives Coordinator, University of British Columbia

Geoff Harder - UAlberta

- many different types of digital assets

        - OJS, IR, research data, newspaper digitization

- what does digital preservation mean?

        - preserving more than the objects and items - what about conversations, software

- goal: 500 year commitment to long term access

- current capacity ~52TB, ~18TB currently used (will be at 300TB by 2012, 500TB by 2014)

- concerns: scalability, predictability, sustainability

- following the CDL model

        - policies

- staffing

- sustainable funding

- external partnerships (this is key!)

- microservices

- Trusted Digital Repository - the goal for preservation

        - to get a better grasp on all that’s entailed - held a multiday workshop with CRL to work on accreditation process

        - develop tiers (gold/silver/bronze)

- new position - Digital Preservation Officer - to help with policy creation

- use OAIS reference model

        - good lord they have liaison librarians helping out

- multiple preservation strategies used (many eggs in many baskets)

- preserve what matters (we can’t keep everything)

- includes...


- International Polar Year Data Archive

        - iRODS backbone

        - CANARIE fibre link between UAlberta & OCUL

        - already wrestling with governance and sustainability issues

        - messy data = messy work

- using Archive-It for collections looking at western Canada, cirucmpolar, gov docs, significant events

Mark Jordan, SFU

- just getting started - what does it mean to undertake a digital preservation program?

        - added to 5 year plan (currenlty being implemented)

        - starting with policy and practice

- the library can’t do it alone - building relationships with:

        - IT services

        - SFU Archives

        - VP Research (ooooh, the senior admin is onboard - HELLZ YA!)

- thinking strategically

        - value of institutional hisotry

        - direct $$ value of digital content  (digitize it ourselves or pay for perpetual access)

        - risk management (esp. with regards to research data mgmt)

        - economies of scale and economies of collaboration (are we really going to invest more $$ in hardware?)

- existing activity at SFU

        - bit-level preservation (good back-up system)

        - IR (newly revived )

        - thesis management system (online submission including supplemental files, with workflow for backroom deposit)

                - this helps in general with thinking about workflows

        - LOCKSS

        - workflow integration proof of concept (

        - SFU Archives’ pilot (Artefactual using Archivematica - preserving president’s email)

- CARL CFI data mgmt proposal


- how to handle large video

- how to handle data

- preservation metadata

        - when do you generate it and how do you integrate it without annoying people

- digital-only theses with associated video and data

Slavko Manojlovich, Memorial University

Digital Preservation on the East Coast

- using ContentDM, ePrints, OJS, [Archivematica]

- OAIS Reference Model

- before preserving, you need to know file types and formats

        - more than file name extensions

        - are you trying to preserve the look and field? the reusability?

        - what are the access formats you need for your users?

        - what tools do you need to normalize these formats?

- preservation planning involves

- monitor users

- monitor tech

- develop strategies

- package things up with a bow

- remember when wmv was the way of the future? then swf? THEN THE IPAD ARRIVED AND RUINED EVERYTHING

- mxf motion jpeg 2000 is the one being recommended by the film industry

Bronwen Sprout

Digital Preservation Strategy Pilot Program Implementation

- been working on it since January with Artefactual

        - gap analysis

        - strategy & systems architecture

        - findings will be openly accessible when complete - YAY!

Piot projects

- Rare Books & Special Collections (born digital)

        - ingest records and upload into ICA-AtoM

        - testing legacy born digital records (~10-20 years old)

        - findings

- file failures

- accessibility issues (public access vs system only)

- appraisal issues (de-accessioning)

- arrangement issues (arranged as “digital media” instead of by creator)

- Archives (born digital)

- cIRcle repository (dspace) ~36,000 items

        - focus of Artefactual

        - receive SIPs from dspace, with preservation happening strictly in Archivematica (no DIPs created)

- digitization projects (ContentDM)

        - integrate DIPs from Archivematica

- website archiving

        - planning to use Heritrix and Wayback Machine

- data preservation

Next steps

- completion of pilot projects

- working out research data policies now

- LOCKSS PLN - how does it fit?

- TDR certification prep

Lightning talks part deux

Geri Ingram, OCLC

WorldCat Digital Collection Gateway

- uses MARC to dump fun things into WorldCat

- tech challenges

        - metadata mapping

        - mental model         (how do you get folks to use self-service when OAI-PMH is passive?)

        - SEO/syndication challenges (get indexed gooder)

   - OAIster stuff is there

[notes are getting sketchy as i’m totally crashing from too much sugar at lunch and then the haagen dazs bar at the break. yum.]

- can take anything that is OAI-PMH