Access Live Note-Taking, Day 2
Open to the Public: Indigenous Collections and the Ethics of Openness
Softly, Softly, Catchee Monkey: Successful Development and Implementation
Canadiana Discovery Portal Metadata and API
Hackfest report part deux
If you ain’t failin’, you ain’t tryin’
Ask Anything: The Human Search Engine
Gardening the Wiki, Growing the Web: Transforming UBC Library’s approach to content management through UBC Wiki
Lightning talks part three
Kimberly Christen, Associate Professor, Washington State University
Alex Merrill, Digital Initiative Librarian, Washington State University
Plateau Peoples’ Web Portal - plateauportal.wsulibs.wsu.edu/
- currently involving five tribes in a tri-state region of the Pacific Northwest
- other partners: plateau center, smithsonian, northwest museum of art and culture
- Washington State has an MOU with 11 tribes of the region
- wanted access to their materials off-campus
- “digital repatriation project”
- part of a larger discourse of physical repatriation
- socially, culturally and emotionally meaningful
- equal decision-making by all stakeholders
- encoded the needs and cultural protocols into every aspect of the project
- tribes wanted access to their materials off-site, and control to provide traditional knowledge and comments to the existing records
- wanted control to add more metadata to make the voices equal
- wanted to choose what was included (and excluded) from the collection
- sensitive/sacred items were not digitized
- questioned the automatic assumption that making everything open is always de facto beneficial
- if it violates native ethnic systems
- which public do we serve?
- sought to balance needs & sovereign rights with institutional concerns of libraries
- content of portal reflects collaborative nature
- tribal catalogue record (dublin core) metadata frequently incomplete/incorrect/sometimes offensive
- tribal knowledge record written by the tribe (can be multi-author, tribes required that no one be anonymous though, so logins and time stamps used)
- 5 user profiles created
- metadata fields aren’t mandatory - you include what you need
- tribes select access levels (Washington State makes their content open, but tribal content can be restricted)
- open to all/affiliated tribes/certain tribes/only my tribe
- tribal categories can trump LC subject headings
- geotagging possible
- more concerned with relationships and circulation of knowledge than building tools to disseminate
- built with...
- LAMP stack
- iframe pdf viewer
- google maps integration
- but it’s not the tech that matters
- software (required for the baked in protocols) will be moved into Drupal 7 in the next 6-8 months
- yay more sustainable!
- blurring the line between metadata and original content
- stewards of the narrative
- what do they get out of it
- context for digital collections
- dynamic collection
- all comments from the tribe are also added to the archive
- Mukurtu - http://www.mukurtu.org/ - is the software that was used to initially build it
What is your identity management system? (ex. for elders)
The political control is all done by the tribes. Will be Drupal logins but at the moment it’s hacked. Currently metadata is in MYSQL - will move into Mukurtu as the engine for Drupal nodes.
Repatriation/reconciliation movements & social media
The technology allows us to cede control of these collections to the people who know best.
Open Access is another tension - we need to think about how “open” is constructed. Why not think about it as a continuum between open and closed?
Sustainability of the project
Housed in the Plateau Centre for Indigenous Studies - direct line to the Provost’s office. Built institutional support and structure for the Centre. Library has agreed to host in perpetuity at no cost for the tribes.
Additional MOU - all the content added by tribes can be taken back anytime they ask for it.
Smithsonian MOU - want the context
Soft money funded at the moment. RA covered by the dept.
Matt Carlson, ILS Administrator, King County Library System
Grace Dunbar, COO, Equinox
ooooh tag-team presentation!
tips for starting out with an open source system
- test the server with a stable release
- you can use a VM if it’s a small bit o’software, but for an ILS, no go
- gap analysis (side-by-side comparison/requirements project)
- what do you want that is missing, and do you really want to spend the time/money to develop it OR can you fix this with workflow?
- are gaps major or minor?
- no acquisitions system in an ILS = major
- do receipts print out perfectly? = minor
- outsource development?
What do you really need?
- build use cases (ignore the edge cases)
- workflows (build one. fer serious.)
- focus on outcomes, not processes (don’t get caught up in minutiae)
- be specific
Now that you have the scope of work...
- engage with a community
- look locally
- write an RFI/RFP (note: open source stuff doesn’t fit well with traditional RFI/RFPs - cuz they aren’t really selling a product)
- request a quote from a vendor
- hire a consultant
- specify estimated hours, costs, ownership of work, documentation
- testing/sign off (if you find a bug the day after the contract ends, who is responsible?)
- lots of local customizations are the death of open source installations - next update, yer screwed
- scope creep (can be tense, but must be done)
- be realistic about time for testing, clarifications & feedback
- best practices
- update your project plan constantly
- build a team of subject matter experts
- real examples, use cases and mockups whenever possible
- never too soon to start thinking about your go-live timeline and identify dependencies
- multiple clients/projects competing for time
- communication (make sure frontline staff get a say, and make sure there is a meaningful way to respond)
- use cases
- best practices
- 1-to-1 project managers
- clear, shared objectives
- set priorities (things will be discovered during prep - have a top 5 things you must have to go live)
- create a test manual and use it
- engage staff & patrons to find solutions
- there is no one perfect way to test - get creative
- you need a test server. fer serious.
- have an exit strategy
- what’s your plan b?
- you’ll need to explain why the system has been down for 6 hours
Stay on target
- “don’t let perfect be the enemy of good enough” (HELLZ YES)
- “no plan survives initial contact”
- managers aren’t necessarily trainers
- set aside mandatory time
- structured feedback is critical
- have a plan for on-going training
- implement in phases
- have a fall back position if you need to rollback
- and be sure to let everyone know that you have a fall back plan!
- change is hard - be sure to celebrate it
- look for ways to celebrate along the way too
DON’T FREAK OUT
HAVE FUN - THE THING YOU ARE DOING IS COOL
HAVE A LIFE OUTSIDE THIS PROJECT
What are the challenges with the Evergreen implementation?
Things have gone well. Implementing a buncha changes. Minimal downtime.
Big picture - new system. Problems?
Catalogue issues - different experience searching (ex. checkout history). Account interaction.
William Wueppelmann, Manager, Information Systems, Canadiana.org
- successor to the Alouette portal
- metadata comes from collection holders
- they use lotsa tech, i missed it so you don’t get the list, but it’s the usual suspects
- lots of keyword stuff
- command-line searching for advanced users
- 1 million items, 3+ million pages of indexed full text, 300,000 images
- page, document, seirels level indexing
- any memory institution can contribute
- simpified subset of metadata
- this is a finding aid, not a union catalogue
- harvest from contributors
- convert to Canadian Metadata Repository (CMR) format
- index in Solr
- available via website or API
Ingestion & conversion
- hardest part: getting metadata from contributors
- harvested from
- Dublin Core
- weird CSV
- XML & OCR in .txt files
- PERL hackery to convert to XML
- XSLT to get it into CMR format
- more PERL hackery to normalize and validate
CMR to Solr
- CMR is not meant for direct access
- flipped to Solr schema
- updated regularly
CMR funcational requirements
- map metadata from diverse sources to common set
- normalize sortable/facetable data
- simple to convert to Solr
- manage and link parent-child relationships
- control fields to manage relationships between records
- key (unique id)
- parent key
- label (in context description)
- pubdate (range of dates)
- bibliographic & full text fields
- resource fields for URLs, filenames, etc
- enable sorting
- focus on low-effort, high-utility data
- 3 steps
Identifying /normalizing pubdate
- look at samples from collection and make an educated guess
- normalizing it is tricky - cataloguers do weird things
Identifying/normalizing media types
- there are lotsa different terms for functional access types (esp. text)
- long list of possible types
- based on heuristics
- not 100% acurate
- more fields are optional
- erring on the side of exclusiong increases relevance, encourages better metadata
Indexing CMR records
- structural manipulation only - no content hacking
- separating storage and index/access formats makes changing either one much easier
Web service API
- originally built for internal use to support ajax queries
- extension of the interactive query syntax
- API calls
- get xml or json (identical info)
- json more convenient for machine parsing
- xml better for the peoples
- response message
- request status
- paging info
- result set - matching pages
- other query types
- searches can also be run at the page level or at the series level
- individual records can be retrieved
- tagged sets identified by some crtiteria can be used to create searchable subsets
- can be used to creat custom views of the database
- example: set = bc limits searches to content contributed by BC institutions
Are you returning any facets? If so, what does that output look like?
Yup, all facetable fields get returned - exactly what you’d get out of Solr.
No particular standard for the output - we are keeping it simple.
Library Community Mapping
- tool to record info about local community resources for community/outreach librarians
- check out librariesincommunities.ca
- community asset mapping
- social mapping
- wanted it so that librarians could go out with a tablet to find their community
- focus on 2 types of data entry
- formal: form, exact data, done by librarians
- informal: users pinpoint a map
- basic architecture
- web form
- google fusion table
- google map
- google fusion
- edit data points from gmaps)
- came out last year, still very buggy
- will continue working on it
Build a tool to create a list of materials that they could use for patron-driven acq or a wishlist
- WorldCat search API to pull metadata (OpenSearch, Atom output)
- couchDB to store the list (likes that it has a true REST API)
- PhP - Zend framework
- Zend is easier (HTTP and authentication)
- JSON handling
- feed production
- Ajax searching
- not enough time
- new to couchDB
- write operations require authentication (basic auth, API key)
- updating JSON objects challenging (nested arrays)
- XML namespace fun (Atom output has Atom, DC and OCLC fields)
- XSS makes writing AJAX actions more difficult
- JSONP for read operations
- can’t do write operations without a proxy
- <3 couchDB
- cloudant has free low usage accounts so you don’t need to install it yourself
- could be used to teach devs how to interact with REST services
- can test how service works using REST Client
- JSON can be infinitely easier to work with than XML sometimes
- using framework can speed development - but she was karaokeing last night so it’s not finished yet
- watch for a blog post on the OCLC developer network
[it is hard to take notes for hackfest reports. sorry folks]
Visual Aids for Archivists
Sean Ellis - Princeton
- indexing script
- query (get “content type” facet on all files)
- pull in JSON and generate piechart based on content type facets
- data from Digital Corpora
didn’t have enough time to get it into ARchivematica
- set up a eweb accessible test machine with pre-installed stack
- set up a git or svn dev branch so code can be committed and shared right away
- identify roles and tasks right away
- not everyone need the same dev environment (visualizations can be done in html and js)
- more time playing with tools!
Firefox plugin to show all other websites that pull info from a given site
Ghostery.com - to see the invisible web
vision: to make the digital appear in the real using apps
- wanted something to locate information/items, to poll users, adding subjects to books on shelves (cuz call numbers are useless for patrons)
Layar software - layar.com
- GPS location based
- vision: image recognition
- drupal module (centralized dbase)
Argon software - argon.georgiatech.edu
- open source
- uses KML on back end
- put it on any webserver
- still in development, and only available on ios products, not centralized
- oooooh floating Access logo, i feel so augmented
- Layar demo for Access fun!
TIME FOR SUSHI
Since Amy is presenting, I (Krista Godfrey) will be filling in for the next session.
Amy Buckland, eScholarship, ePublishing & Digitization Coordinator, McGill University Library
Scott Hargrove, Director of Information Technology and Support Services at Fraser Valley Regional Library
Declan Fleming, Director of IT, University of California, San Diego Libraries.
Nick Ruest, Digital Preservation Librarian, Repository Architect and Digitization Coordinator, McMaster University
- idea came from talking with colleague - both tried same project and both failed - stop reinventing the wheel. need to talk about what doesn’t work, come up with solutions together, or kill project
- 3 projects to talk about
1. 5 years ago, youtube was new. idea to circulate downloadable movies and had a few days to trial it. a couple machines had problems
- all the movies were old
- implement a month later, with new version which didn’t work on the computers
- less content now, the good stuff gone
- copyright issues - company didn’t think about it
- turns out a 3 year contract. with legal advice, got out of contract
- lesson: bleeding edge can be tough. have clauses to get out of bad contracts
2. built own self-checks - how hard can it be?
- too many hours put into it but get 1 running. 3 more later, it’s getting harder. if purchased, owuld have saved half the money
- lesson: not always best to do it yourself
3. telephone system
- went with VoIP. spent tiem doing rfp, etc.
- get installed. software help stays but leaves shortly. then start thinking about tweaking. call and number is out of service. website gone. email doesn’t work.
- company closed the day they launched
- lesson: no matter how much you plan, there are always things you can not control
- 1st spoke about this 3 years ago. it failed
- Peace and War 20th Century - digitization project. No infrastructure, workflow, policies. 12 months to do and make it 2.0 social
- took a month and protoyped. went with drupal 5. launched a year later
- another grant for another thematic collection on Canadian publishing
- then came time for upgrade - too much customization.
- lesson: don’t make thematic websites. not scalable, not enough people.
- pulled data out of drupal and have since written policies and guidelines. going with fedora/Islandora
1 - fell going up escalator wrong way
2 - old discs - put data in one place and return discs to data owner. Not sure how the person got the discs. Discs wouldn’t mount (3 usb drives used). discs had been on a carrier ship that had been exposed to weather. finally managed to get all the data copied over and returned discs
- returned discs and people said these worked when we gave them to you and these are the originals from the last 10 years.
- IT demanded never get original stuff again - better processes made
- Tech outpaces process.
- couldnt’ create digital library solos across the libraries but this helped define roles, workflows
- identify failpoints and move past them
- be more politically astute and be more aware of originals
3 - machine room 17 racks. no central IT at his institution. server room has cooling - 10 ton water chilled. Backup on roof (freon) but these freeze and then they can melt. This is actually not the fail. The pan overflowed. Fix was to put nipple in pan and lead house to garbage can. Eventually unit moved.
Panel over. Open to audience
Tom Eadie - Mt. A - relates to Trent time. Never trust a techie. Never trust anyone. Verify Verify Verify. Be friends with admin - take luck as it comes. DRA customer was moving to new version (which didn’t happen). Needed an alpha machine but for short transition and didn’t want to buy for short period. Made deal with Toronto. Assured there was high speed connection. Always test things. Circulation transactions were timing out -upset staff, patrons. Had to move back to own server. Used the experience to get high speed connection of their own.
Dan Suchy - UCSD - laptop classroom (new 6 years ago). passionate librarian drove project and then left. All sorts of money allocated - person taking over realized that project wasn’t quite right for the institution but didn’t speak up. It failed and had to decommission 30 laptops. Failed to speak up, failed by losing champion, pinning on one individual
Randy - University of Guelph - working on computers in student rooms. always check and see if computer is working. put in required hardware and computer wouldn’t turn on. nothing he could have done to prevent (nut fell and hit video card). can’t prepare for all fails
Amy Buckland - Second Life was the future. It’s not.
Dan - pc management system. synchronize clocks for all. calls from the desk, something wrong with the clock - it’s off. turn the clock to match pc. grant money - synchronize all clocks, analog wireless clock. batteries drain really fast, so need cord. costs go up. sent them all back. still turning clocks. there isn’t always a technological solution.
Peter Binkley - minifail. University of Leeds, digitization major bibliography. he was to help clean up subject headings of geographical place names. 4 pc in office and devised system of moving things on floppies. button for floppy drive is close to button to turn off without saving, losing all data just created.
Cameron Metcalf - University of Ottawa - server room 2 floors below ground in old systems office. Next to central IT computer room. Interrupted power causes tension as library is outside room. Lots of building on campus - so 2 parties to talk to beside the library. Lots of fails in communication and things being shut down. Servers were going down, talked and got fixed, went down again next day. bunch of electricians and could see one, looking to lights saying nope, that wasn’t it, as if changing fuses in the kitchen.
keep talking about failure. visit failbrary.org (moved to drupal this morning). post anonymously if you want. big project failure coming soon with documentation to help in future.
and now back to your regularly scheduled live note taker, Amy
Brian Owen, Associate University Librarian for Processing and Systems, Simon Fraser University
please not too many Star Trek questions!
but otherwise ask any Qs
Ideal question turns into a lightning session, or speed-dating
Overhyped library tech trends?
- tag clouds
- the cloud
- qr codes
- mobile websites
- check Eric Lease Morgan’s blog
- recent DPLA submission
Serial prediction patterns for journals in Evergreen - where do we start? there’s no open standard
Need to communicate biz hours to biz community - any hours management applications that exist? (Bldg hours, ref hours, biz hours)
- Stanford built a web service
- WorldCat registry has this and in November this will be a web service - can break it up by branches
- UVic built a php thinger
Anyone implemented course reserve mgmt systems? if so, how are you getting instant updates on student info?
- Washington STate has hourly updates through Active Directory
Anyone ever build a system to help staff to the detriment of users?
Linked open data: if you have a dspace instance, and have been issuing handles, and now you’re looking at it from LOD and want URIs, are handles usable for that? and if not - then what?
- build URIs that do content negotiation, with appropriate mime-type to the handle (302?)
- UCSD is using ARC (NOID redirector - this points to that)
Are libraries afraid of the demise of paper books?
- if librarians don’t know who does?
- Randy Oldham - looking at use of tech by millenials - has some good data - he’s at Guelph
Does anyone know a script/algorithm that takes the tape counter markers and adds timestamps when you digitize them?
Paul Joseph, Systems Librarian, University of British Columbia
Will Engle, Wiki Administrator, University of British Columbia
Julie Mitchell, Learning Commons Coordinator, University of British Columbia
Paul is a PPT geek
- file - save as - slideshow
Will may have to bolt for possible baby arrival
What is a wiki?
- created in 1995 in Oregon
- people who work together trust each other
- pushing stuff to the web is a wicked slow process
- used wiki instead of “quick web”
Overview & origins of UBC Wiki & CMS (wordpress)
- campus-wide wiki
- can embed wiki pages in a buncha different places (WebCT, wordpress)
- migrated Learning Commons from MovableType to Wordpress in 2007
- focused on teaching and learning needs
- blogs (any campus member can get one)
- sites (any campus member can get one)
- wikis (one wiki to rule them all)
- partnerships: CTLT + Public Affairs + IT
- migrated ubc library site from static html
- cms concerns: time, resources, workflow
- 3rd party solution sought - and found on campus!
System in action
- campus-wide authentification
- app-based authorization
- common look and feel (global templating
- open source software
- committed team
- wiki pages contextualized through namespaces
- main: open to all
- course: should be invited to contribute
- documentation: should be invited to contribute
- sandbox: personal space
- some quasi-private that you need to be invited to edit
- content contribution from multiple authors
- interoperability between units
- break down the roles between writers and editors
- challenges aren’t technical, they are building partnerships
- accuracy of content: wiki watchlists
- build trust in colleagues
- branding: wiki embed
- branding to represent collaboration or debranding
- control & ownership: philosophical ownership through organization and namespace; watchlists
- shit away from silos
- collaboaration reduces duplication, saves $
- control over destination of content
- TRUST YO COLLEAGUES
- philosophical shift
- build relationships
- emphasize reduction of workload and increased accuracy of content
- wholistic systems thinking
Parsing value out of raw OCR
[screen full of code here kids... i ain’t typing it here]
[looks good though]
[talk to Peter Binkley at UAlberta]
Decision Support Tools
Peter Murray - Lyrasis
Self-guided Decision Tools for Open Source
- control vs responsibility
- question for parent IT org
- tech skills matrix
- software selection
- costs of open source software
Open Source Software Registry
- library-speicfic projects
- package is at the center
- relationships with others
- institutional users
Better use of Google Analytics for digital collections stats
Weiwei Shi - UAlberta
- can use google maps to track where searches are coming from
- use it to track downloads in IR
Digitizing current materials
Bohyun Kim - Florida Intl University
- Bohyun’s notes are here - http://www.bohyunkim.net/blog/archives/1569
Bess Sadler - Stanford University
- using ForensicsToolkit (FTK)
- designed to create PDFs that are submissable for court
- outputs superdense XML objects
- working on “reverse engineering” them into Fedora & Bagit objects
- includes html derivative
- metadata - need to expand MODS
- future work
- separate out FTK processing
- other languages (this is ruby)
- available on github under project hydra
tableausoftware.com - check it out