Open research data: fun, important, and in need of librarians

Heather Piwowar, DataONE Postdoc, UBC, NESCent, and Dryad

- Open research data - important for academic, but also public librarians too

- Standing on the shoulders of giants. We build on the work of the people before us.

- research is available in a number of ways - papers, lab notes, etc - making shoulders broader and helping others more

- data often doesn’t make it into repositories for others to use

- in order to share researchers must find, organize, document, deidentitfy it, format, ask, submit

then, worry about mistakes, misinterpretation, etc - not motivating to share data

- is it important? more than ½ papers contain errors, up to 10% of mistakes will change conclusions

- sharing done by request now. often don’t get it despite request - ⅓ don’t reply, ⅓ say will and don’t, ⅓ say no.

- why no? publish more, exclusive use, data confidentiality, control, cost of prep

- researchers want to know what their data is being used for.

data access is self-selecting. not efficient, not fair - held back from young and productive researchers

- slows down research, education

- research sharing needs help, librarians are good at helping, open data needs you!

- need librarians as data host. urls stop working. put in repository. lots of types of repositories - data, journal, institutional, disciplinary, etc. which is right? probably many of them. some are obvious - DNA goes in GenBank

- institutional repositories are important, but must commit to do it well. We know this stuff!

- we need to educate!

- educate repositories - doing it well is hard.

- educate researchers - researchers know data, we know long term preservation, let’s talk. non-threatening message from the library, we’re not tyring to take over your research

- educate researchers about the benefits of data repositories. carrots? citations - data sharing can mean more citations. evidence starting to show datasets are reused - longtail to data

- educate researchers about best practices. Selection, format, location, metadata

- educate researchers about scientific contribution. Her study to track data use and impact - 10 x 100 x 1000 (10 repositories, 100 , 1000 datasets) study using GScholar and Web of Science to get idea of data use. prelim - lots of use, varies by repository. are the creators of data also reusers of data?

- educate researchers about getting recognition. put on CV, letters, biosketches

- tool emerging from a hackathon - Total Impact - Use open API to find reuse of artifacts. One stop shop to measure impact (PLoS, Mendeley, PubMed, etc). works for articles and datasets

- educate admin 

- of institutional benefits

- what others are doing. Need a lead to talk about what others are doing - it’s not obvious what others are doing

- educate the public  

- links for nonfiction with datasets? city data? data journalist

- librarians need to advocate!

- open data about use - repository stats, reference lists.

- policies - data availability, data citation

- more evidence

host, educate, advocate!!!


- CARL looking at doing this type of thing. There are silos. How break?

 - publish in open access journals, publish in areas where decision makers reading, high impact areas

- Format is huge issue - gap between making that data reusable. What can librarians do about the issue.

        - there is debate. do you start with standards or take the data and open it? She thinks make it open first, standards second. Many will work to get the data needed and deal with the bugs. Librarians good at building things that scale - don’t overemphasis this - perfect shouldn’t get in the way of good.

- Open source community is good example for open data community.

        - reputation is a big player- researchers embarrassed by data (not clean enough)

Evergreen ILS undressed

Ben Hyman, Managing Director, BC Libraries Cooperative

Jed Moffitt, Director of Information Technology Services, King County Library System (KCLS)

Steven Chan, Sitka Project Team

James Fournie, Librarian/Developer, Sitka Project Team

Matt Carlson, ILS Administrator, King County Library System

Grace Dunbar, COO, Equinox


- shared consortial initiative (BCLibraries Cooperative). now 54 in over 100 communities in BC and Manitoba

- consortia can leverage metrics, save $.

- strategically develop infrastructure.

- knew as soon as started, there would be pieces to put on top

- selection process multifaceted.

- conference in 2013 will be in Vancouver


- community is a powerful thing

- we’re all trying to figure out our direction, our future, our profession. big question is who to you want to be with?

- we want to be in control of what we do, make changes, be part of the community


- taking advantage of the open source nature of the ILS

- centralized process. push out standard version of evergreen to all branches

- use feedback and can have quick turn around for changes. helps with buy in

- doing things with data from system. graphic that mapped usage of system. could pull data easily.

- be creative and see what we can do


- Evergreen reflects community of people using, great story behind it. Georgia needed something and group decided to go open source and make it work.

- over 1000 libraries using it

- vendor in partnership, in the bus with you, part of the community

- development really strong over the last year. working towards timed releases.  

- demo of some of the new things coming in 2.2 (see screenshots from pres: MARC stuff, Authority control sets)


- mobile opac for Evergreen with open web services API

- Evergreen comes with SlimPac but no account features (no holds, etc)

- started with wireframe version (see slide for graphic)- focus on control logic, interactivity, function not display/aesthetics

- wireframe now in production use.

- used jQuery mobile to make look pretty, work with number of devices

-  cover images expandable when clicked

- reduced function set of full opac

- data shim used to clean up code, fit with evergreen ajax api

- code is open source - issue tracker turned on, under MIT license

- what’s next? view added content, related subjects, authors, series, browse by call number, place copy-level holds, suspend holds by date, and too many more for my fingers to type

- see slides for tons of urls to resources

All together now: creating software ecosystems from open, interoperable components

Bess Sadler, Software Engineer, Stanford University

Marty Tarle, VP Engineering, Bibliocommons

Marty Tarle

How do proprietary vendors fit into the open source community?

- typical library software ecosystem

- some open source, some proprietary - but it all has to work together

Perception of proprietary vendors

- closed and inflexible

- lack APIs, difficult to integrate with

- loooooong development cycles

Focus is often on the wrong things

- open sourcing

        - inefficient and costly without vendor buy-in

- standards support

        - standards are out of date and limited

- direct access to data

        - tremendous duplication of algorithms, infrastructure and operations

Focus should be on vendor cooperation

- interoperability is a two-way street

- vendors need to

        - proactively enable integrations

        - proactively integrate other solutions into theirs

Vendor development models

- agility is critical

- scrum and lean are now the norm

- long development cycles are unacceptable

Vendor delivery models

- SaaS - rapid deployment of new functionality

- cloud - rapid scaling of hardware

- industry trend is towards “continuous deployment”

Vendor culture

- openness

        - part of hte company’s DNA

- integration

        - core organizational capability

- openness

        - proactive, continuous effort

What to ask your vendors

- pace of innovation

        - how many releases

        - release notes

        - development model

        - delivery model


        - public

        - scalable

        - flexible

- white label

- cloud/SOA

Adopting OS technology too soon

- impacts service delivery

- increases in-house IT costs

- increases operational costs organization-wide

Best time to adopt open source

- most projects are in mature/commodity categories

        - operating systems

        - browsers

        - databases

        - CMS

Best time to adopt proprietary

- beginning of the cycle

- commodity/mature products still charge high tech prices

- provide the best value earlier in teh cycle

Open source as complementary and additional solutions

Bess Sadler

“Makers make what the market won’t or can’t produce.”

- that’s what we are

- responding to user needs trumps profitability

- “purpose-suited small tools” are important

Project Hydra -

- a community (8 institutions, 28 code contributors)

- goals

        - sustainability/supportabiltiy

        - innovation and rapid development

        - responsiveness to local user needs

        - adaptabiliyt to local indiosyncracies

        - shared work and responsibility across institutions

- built with very little grant-funding

- let’s not build software in isolation of the user

- code is on github

Plugin architecture (purpose-suited small tools)

- clean separation between shared code adn local modifications

- clean upgrade path (if you have local tests)

- plugin modules for new functionality

        - including competing models

- local innovation doesn’t have to wait on the larger community

        - allow it to thrive at local institutions

Good development practice builds trust - ironically because i don’t have to trust you that much


- Producing Open Source Software - free download at

- Practices of an Agile Developer

No code without tests

- stop spending time on tracking bugs

- tests run automatically (trust, but verify)

- everyone knows when something changes

- everyone knows when something breaks        

- docs are consistent, public, and automatic

        - put in some initial investment to come up with a system to automagically create them

- (some) local code is public


        - don’t be afraid

- planned work is public

- collaborative storyboarding and design

Make time for face-to-face meetings

Closing Keynote – Open Data Policy

Andrea Reimer, Councillor, City of Vancouver

- grew up dreaming of moving democracy forward

- people-powered democracy

        - all people are created equally

        - all people have the ability to reason

- the qualities of decisions made are only as good as the quality of the information given

- read: David Eaves (blogger - and Tim Wilson (GIS guru)

- what if you created a city that thinks like the web

Open3: involved, active, empowered

1. open and accessible data

- share the greatest amount of data possible while respecting privacy and security concerns

- no data map in the City though - no one knows where all the data lives

- built a framework so that folks could routinely release data without getting permissions


        - 5 iterations so far

- fear that if you release the data people will find mistakes - don’t you want to find these errors?

2. open standards

- adopt prevailing open standards for data, documents, maps, and other formats

- -

        - trash pickup sked

        - 20% of 311 calls were about trash

3. open source software

- when considering new applications, place open source software on equal footing with commercial systems during procurement cycles

- software support is key for the city so it plays a big role in procurement grading

        - but what’s easier? calling microsoft or checking a wiki?

- licensing is also an issue

- if you use Vancouver open data you must make the findings freely available (not req for all groups - ex. list of safe houses for crisis centres)

- tricky to pass this during a recession - if we can make money off this data, why aren’t we selling it?

- opening up the data provided leadership for others to do it too

Bonus points

- Google StreetView launched in Vancouver (instead of Toronto) because of the open data movement

- Pixar settled it’s first non-California office in Vancouver because they felt the open data initiative meant that the govt understood their needs

        - actions speak louder than words

- hosted largest creative economy conference SigGraph -

- the city won the most innovative biz award from BC Business Magazine in 2011

What’s missing?

- people to actually distribute and use the data

        - people need to build the apps

        - crowdsource info

        - help people use the data

- challenge isn’t building the apps, it’s getting people to use them

        - check out

What does it mean for Vancouver?

- citizens can be involved, active and empowered

- asking people to comment on the budget, but they don’t have any idea what the baseline is

        - now there’s an app to help people understand capital budgets

- meaningful feedback is based on robust information

- government can build things, but can’t force people to come to them

- citizens have their own policy ideas

        - what if you could just do it? if you never had to call city hall?

- sometimes it feels like we haven’t done a lot...

        - “we totally overestimate what we can get done in a year, but completely underestimate what we can get done in 5 years”

- be patiently relentless

- speak to elected officials and the public, not to each other

- check out


Smaller communities - any thoughts?

Smaller cities are totally kicking butts at this.

But everyone has the same problems with making this happen.

Open data hackathons - what about getting non-profit folks involved instead of just geeks?        

Read Open is Dead - - this only speaks to geeks and policy wonks, let’s find new terms for this concept.

Libraries are the original open data.

We need to get someone who is a geek in non-profit.

Question about partisanship - how do you deal with this?

Not religious, but if she did, it would be the Age of Reason. Doesn’t dislike Republicans, dislikes people who undermine the Age of Reason (hullo Dubya).

We need to use this data to increase reason in society.

Frequently heard: is it really worth the money we are spending? (Between firefighters and libraries, most choose firefighters.) So how do you measure the value in doing this?

There are concrete examples (earned media from Google StreetView, SigGraph), the 311 trash calls.

~45,000 downloads of data sets so far.


Check out the history of Access