Day 1 Liveblog (Friday, Oct 19)
Bagel run on Sunday! Exact change only. Talk to Sarah Severson!
“We were Otaku before it was cool” (Aaron Straup Cope)
- Been away from Montreal for almost 8 years, still “chez moi”
- Photo of local graffiti in 2003, now the site for a condo
- No thesis statement...yet. Too soon for that. Making an argument and showing examples that SOMETHING IS HAPPENING, and that the future approaches quickly
- Otaku: Japanese term referring to subculture of low-brow, unimportant stuff in massive quantities. Repurposed to mean any group of amateur collectors “whether we want to be or not.” (Henry Ford museum soap collection, for example); passionate, obsessive, more concerned with the accumulation of data than objects...one of the keys to understanding the web, post-national, we are all curators
- “communities of authority”
- Galleries introduced in Flickr in 2009 to think about the site differently, already had 5 billion photos and Flickr was not curating these, There were two rules: no more than 18 photos and none of them could be your photos. People would flex their curatorial muscles (now curating is a dirty word, but still about the act of choosing)
- “the confidence of difficult”: Is a map useful or pretty? Art and craft and design. In the mid-90s, the debate around art and craft was vicious and brutal. Still contentious, but then we could feel the ground beneath us shifting, but no one knew how or why. Economic underpinnings have now collapse. Design studios routinely get called out for making art that has no use. Craftspeople are making terrific money (Etsy) on the web.
- Cartography is about time: why would you spend so much time making something that is not useful? Again, about economics. Our current confidence about judging the worth of maps has become confused.
- “robot eyes”: Where do robots fit into the cultural spectrum? Engineers are the new folk art
- Distinctions between museums and archives (and libraries, by extension) is collapsing, if it ever existed. Are archives the basement of museums? I mean the roles and relations to one another, passing through each other simultaneously like art and craft and design. Self-identification is challenging. Consider Rhizome’s Art Base (online archive of digital art). Art Base is always on. Is it an archive, an index or a showcase? As we continue to digitize all kinds of media, what does that mean?
- “random access memory” Kiva was purchased by Amazon last year (after they’d purchased Zappos) for their robots and warehouses. Amazing pieces of technology. Figuring out about how to pack the most things in the smallest space and then get it out as fast as possible. Terrifying and amazing. Access and delivery to all kinds of trivial things (cat litter) -- now we can try to make it work for important things
- “the ‘explore’ problem” Want to put Impressionist works away for 100 years -- our unwillingness to deal with their inertia is crowding out other works in terms of storage space. Why are we keeping it? Mechanics of how you demonstrate and display a collection is really hard work. Was a problem with Flickr (a million sunsets). If not laser-guided robots, then what?
- Smithsonian has recently launched a rebranding campaign. I believe in the Smithsonian, its mandate and that it is a professional organization. But there are problems with the rebranding. Meme culture make it easy to parody: the hip! the cool! the mommy bloggers! -- when we could actually be talking about the collection. All we need to say is “We have spaceships!” or “All UR dead bird ARE belong to us SRSLY” (and in the image...is it an archive? a library? everything about this photo is pure gold)
- By preserving these things, we have an open narrative space to return to them
- THE ART IS A LIE (Amount of photos uploaded on Flickr each day depicted in an installation) -- photographs as a sea of images in overwhelming piles
- New gov.uk website launched recently: TRUST. USERS. DELIVERY. There is no design (no final design), only reckoning. We are afforded the luxury of taking care of these materials. We have studied and trained and we are trusted and responsible.
- Trying to make Cooper-Hewitt native to the web through first-class objects, reimagine and rethinking the entire collection as urls, adding a “random” button scoped to objects with photos (see Pinterest Random Button board). We are all burdened with stories of hockey stick growth. Sometimes things take time. Patience and confidence in what you are doing is as important as a splashy debut. NO DESIGN, JUST RECKONING
- SelfAwarROOMBA Twitter feed -- Look at it now!
- Fan fiction: another kind of random access for collections (Tintin books, for example)
- “history has always been lossy”
- “debate is history”: Every event is an opportunity for apocalypse. The goal of the museum should be to get people to see the things because the digital proxies are not the same. If we are not going to share this stuff, why would we keep it? But the proxies have affordances that the delicate and precious things do not have. They are how we rotate our collections (or a way to plan it), a way for people to see them from a distance. We need to pledge to insure that the resources are present and accounted for on the network.
- Jilly Ballistic piece: error dialogues on ads, larger questions of what it means to be more inclusive. Unit of measure for whether something is important is no longer inclusion. Starting to provide an avenue for a zone of safe-keeping: thinking about preserving well in advance.
- Parallel Flickr is a side project: Creating a backup of the Flickr photos as a living, breathing thing or “shadow copy” on the internet. What if Flickr went dark?
- Potential solution using Flickr Commons: Start there, then track one or two degrees out, and archive their photographs to get the fuzzy edges around the idea of the Commons. A bit creepy to back-up people’s photos without asking them but what if L of C did this? Has to be a trusted institution, as a means of preserving with the trust model/permissions model.
- Ultimately think about how we share what we do with people. Back to the robots: making things possible. And, by extension, letting people work with our stuff, make fictions with them and then saying that we will keep them safe without canonizing them
- @thisisaaronland (exploding unicorn picture)
Questions for Aaron:
What is the design museum trying to accomplish and enable by putting the collection online?
- Exploration, short-term. Want people to link to those object. It is our mandate to be that source. Great time to think about what it means to be collecting design as it becomes more and more ephemeral. War on terror as an example of service design. We should have good records as a cultural heritage institution.
Is it time for Wikipedia to stop deleting pages if the cost of storing pages is not an issue? Should they stop editing?
- Wikipedia does not have a responsibility to be all things to all people. Must be clear about what they are. Spinny Bar Society’s page keeps getting deleted. Taggapedia (good for disambiguation). Yes. I want everyone else to start putting things on the web. Machine Tag Project via Flickr as an example.
Social Feed Manager (Daniel Chudnov)
- Collecting data from social feeds (Twitter, in particular)
- How do researchers study social media?
- By hand. Feeds in Google Reader, assign students collection tasks in Excel and then discipline-specific databases...with whatever help that they can get. Lots of work.
- Options for data collection? Licensing e-resources, and there are Twitter-licensed data providers: DataSift, Gnip, Topsy. What has been learned: friendly, not cheap, more than what we need, expensive, still tools to collect, process, etc.
- Twitter api: user timeline: up to 3200, plus new, public streams: follow, track, geo, up to 3000/minute, search
- Will L of C still archive all Twitter feeds? The do have a historic archive, but they still do not have the resources to make them available.
Discovering the hard to find: new media in traditional journals (Heather Cunningham)
- Discovery: much of what we do as reference librarians
- Medical research is heavily dependant on journals, and it is not as easy to find/connect to journals
- Multimedia content is especially difficult, which is counter-intuitive in our high-tech culture
- Studied A/V in Scopus, PubMed (search Case Reports for images)
- Native interfaces versus those of the aggregators
Locked in the cloud: What lies beyond the peak of inflated expectations? (John Durno & Corey Davis)
- Evaluating cloud-hosted library platforms
- Not going to talk about: application, platform, infrastructure
- Will discuss: vendor lock-in, why we should care and what we should do about it
- Gartner group annual report on hype cycle (therefore “inflated expectations”)
- To answer the question: Disillusionment
- Cloud computing can be the demise of some advances in interoperability, encourage a return to a vertically integrated model, can lock us in in ways we are not actually comfortable with
- What systems? Vendor-managed, multi-tenant, SaaS, subscription model
- Broad strokes in the automation marketplace
- Combined print and electronic workflows, collaborative metadata, technology managed by the vendor (as opposed to the library), provide a “one stop” solution (ILS never really succeeded)
- Information lock-in: Choices in the future depend on your choices today
- Different kinds of lock-in that can occur over two periods: data, software, API, institutional inertia/bias
- 1998: Google happened (electronic texts came earlier), and systems did not keep up with these changes
- We cannot lock into single large systems from single large vendors because we lose flexibility
- Resistance is not futile: Manage technology (which is a core part of our business), consider alternative architectures (must consider swapping out multiple systems simultaneously, which is a concern)
- Standards versus lock-in
- Thinking about migrating data between systems well in advance of when you do is good practice (and would not want to repopulate using MARC records)
- Have an exit strategy with any system before you purchase it
- APIs, Lock-in and Lock-out: limited functionality, limited access to data, can be changed or depreciated
- Once your vendor has you locked in, what incentive do they have to make improvements?
- Caveat emptor: High switching costs, escalating subscription costs, interoperability issues, dwindling innovation, limited choice
- firstname.lastname@example.org and email@example.com
Questions (and comments) for John and Corey:
Are you talking about archiving the function of the system or the data?
- Not just data. Strong library standards, like MARC, around data interchange between systems, and we know what they need to do. Several competing systems can be used if ingested with different standards into the discovery system.
Productivity is an important consideration in terms of staff time, but it must be calculated.
- Important to consider in relation to flexibility with systems
Did UVic talk about the systems costs?
- Yes, managing technology is well handled, good data centre, not much cost/savings around the hardware piece, so it was more about the staff workflow, which was great, but, in retrospect, it is easy to move towards a software system as a solution, but it is harder to get out of it.
Adventures in Linked Data: Building a Connected Research Environment
Lisa Goddard, Memorial University
CWRC Canadian Writing Research Collaboratory
Funded by the Canadian Government
Like most digital humanities project, CWRC is very collaborative
“We need to stop talking about the issue of single-author monograph as the benchmark for excellence.” (DH Manifesto 2004)
- Why Linked Data for Humanities
- We want to ask questions we can’t ask a database
- How can we come up with new ways to browse data?
- Hope that it will allow data from the sciences and government to come together.
- The linked data vision assumes that you want data from other projects and that you want to share your data with others
- This model enforces collaboration
- Big Data: do longer something just the sciences have to deal with. Text stores in the digital humanities is huge.
- Text data is MESSY. It doesn’t fit into databases and spreadsheets. And in most cases humanities scholars want or need text data
- The container for text data must be flexible – to layer interpretations over work
- How does this happen:
- Needed seed data—used data from the Orlando Project (XML data set of bios of Brit Women Writers)
- Orlando Tex Bse was built around doc segments
- Free the entities from their documents
- ID the main classes of entities
- Finally, agree on creating rep of event, places, people, org, works (books, poems, docs, etc) and annotations
- Mint URIs for Entities -- each entity must be individually addressable through their own URI
- Used to make direction connections and visualize the indirect connections
- New ways of browsing through this model
- Due to the complexity and scale, must agree on a template to mind URIs
- VIP: they are SUPOSSED to last forever. If your break, you ruin it for everyone. Think archival and not short term project
- Server name in URI = UNCOOL
- Php=this ain’t for ever so it’s also UNCOOL
- Canonical URIs: used by humans and machines
- You’re minting at least 3 URIs per object
- URI Patterns: some are awesome and human readable, but numbers are easier to deal with
- The relationships between entities:
- RDF statements
- subject -->predicate--> object
- Machine readable definitions
- if this is a book: it had an author
- if this film is based on a book: the book was published before the film released
- Model relationships in a way so the machine can reason in the same way the human brain does (lofty goal, but ontologies help)
- Ontologies: model entities in a way that machines can better understand it
- Provide definitions
- impose rules for machine reasoning
- machine readable
- An ontology is for life (ideally....).
- once it’s published, it needs to be available long after your project is over
- short term project cycles are problematic for this best practice
- Ontology Dowsing is an example of ontology vocabs/search engines
- the rules of ontologies are what are most interesting. Especially the disjoint (a person is not a project)
- FRBR: Work cannot be an expression or manifestation. Work is also an endeavor.
- Assigning Entities to Classes: [visual]
- Classes and predicates work together to support reasoning: Woof->creator->Mrs. Dalloway
- If you give it the right information it can infer new knowledge
- Linked data is using the controlled vocabularies librarians have been developing over time
- Integrating the data model in CWRC is to use CWRC-Writer and open source editor for online digital scholarship.
- Looks like an online text editor
- Easy to add links (i.e. results from authority file to choose from OR you can add your own)
- --over time will need to focus on ways to search options to reduce time
- Behind the scenes: web stores for data queries
- DBPedia RETful API (if you know how to write sparkle)
- You can also make your own authority file to make it available in the first round
- Once you choose that XML tags will be created to link his name to the authority file chosen
- The add relationship button will allow you to connect the subject to an object with a predicate (e.g. Bertrand Russell is the parent of John)
- There needs to be a way to link annotations (scholars love their documents)
- this helps reduce the overlap in the XML doc
- CWRC Entitiy MGMT system: Everything is stored in fedora (entity storage) [link to presentation]
- Q:“Entities are for life, but how do you manage versions?”
- A: If you split in 2, you keep old and new terms. Lots of stability issues that have not been figured out yet.
- Comment: “Be consistent in your predicates and URIs. URIs cannot be broken, but they can be renamed”
- Comment: “URIs are not unique, just big strings. [...]
- Best Practice: URI should resolve.
Big Data, Answers, and Civil Rights
Alistair Croll, Lean Analytics
- “I am in a room of semantic idealists”
- Croll works on big data with people who prefer algorithms to causality
- Presentation themes (3)-->Big Data; Problems with Answers; and Civil Rights issues associated
- Big Data: nothin’ new (the duck definition)
- Large amts of info; public and private; easily linked and collected; stored just because we can (we’re keeping it in hopes that the future machines can analyze it)
- Big data is more and more applied to business; usable by everyone; fed back into the system
- what you found useful changes the way it’s filtered for future use
- Universal truth about data storage: pick any two: 1)Volume 2)Variety 3)Velocity [the hallmarks of state storage and retrieval]
- this is NOT cheap --too expensive to get all 3
- result: IAs prioritize which 2 they need to get optimal performance
- contraint of the IA work for the past 2 decades
- What’s new is that constant dropped to nearly zero
- Moore’s law
- 20 sq KM data warehouse in China
- big, fast, and varied got CHEAP [every time we use google, we use big data)
- Why invent when you can perfect
- Shouldn’t more efficiency mean less consumption??
- efficiency mean lower costs new uses more demand more consumption
NUMBER ONE POINT: Big data is about abundance
- The number of way we can use information of large quantities is growing
- we didn’t evolve to find truth, we evolved to seek peer approval
- Dunbar Limit: you can’t keep track of more than 150 people
- We all look for confirmation rather than understanding
- WEIRD [white, educated, independent, rich, democrat] is a legit psychological term
- Google shows us that we are looking for confirmation (e.g. global warming; global warming proof; global warming hoax)
- problem with big data web rather than semantic, curated, web
- M and W agree on more issues than Republican and Democrat (incredible polarization b/w the two political parties)
- groups see what they want to see (e.g. tag cloud related to open gov’t initiative)
- Google fusion tables -- graph edits
- Reddit: comments are hilarious at time and also flatten the web (e.g. U.S. rower’s bronze medal)
- What if you could install a “controversial topic” plug in as a way to
- Adopt a Po mindset; we have to change how we govern society in a data-driven world
- Prediction is close to prejudice
- Big data is both good and bad
- e.g. Okcupid: if people use certain words to describe themselves and then filter it (e.g. baseball, van halen, the big lebowski = terms used most often by self identified white males)
- Redlining was a practice used by banks to prevent giving loans to certain areas (US fed gov’t outlawed in 60s)
- Personalization looks a lot like prejudice --can look like civil rights issues if we aren't careful
- Po = possibility No is Science; Yes is faith. No is skepticism (show me the data, prove it’s true) Yes (a leap of faith, prove I’m wrong) No--Po--Yes
- Intellect is brainy [a really great diagram is up and it cannot be duplicated in words]
- Creativity is zany
4 big concerns of big data:
- we give up our privacy for free programs
- the vanishing cost of finding things out
- puts our digital crumbs back into a loaf
- correlation is not causality
- machines are not predictable
- they are currently mistaking predictions for fact
- assembling our crumb
- lack of ambiguity that digital tools give us
- laws and moral codes created to live amongst each other. We need to make new rules
Questions: have you thought about putting programmers with bookworms? (referring to chart about creative vs. brainy)
How do you keep yourself open to seeing this?
Q: [missed it] How did we not see big stupid coming our way?
Q: Civil Rights and Big Data: Bitch Magazine published a column from an urban university in a women’s studies dept. Asks students to search for themselves in Google. Students are shocked. Prof: Google needs to fix this Google says that they provide the links people search. What are your thoughts?
A: Debate between curator and the data carrier (e.g. AT&T and phone system). Google does provide what is searched for and offers safe search, etc.
Jonathan Haidt: humans need to learn the difference b/w “must’ and “can”
- We change the bias of info based on our preconceived notions
Bibliobox: A Library in a Box
David Fiander, Western University
- the project started with PirateBox, which was created by David Darts at NYU. He created a stand alone thing which was a wireless router, a hard drive, and created a access point in the classroom for downloading content. It provided a way for people to share files, anonymously, in a shared space.
- community develops around this media sharing
- users can upload and download files; chat function; router does not log information; no trace of people
- not connected to internet.
- stripped out chat function and upload function
- no pirate function
- provides content for users. (e.g. english texts available to kids in a park via their cellphones w/o gov’t snooping)
- Goal: Bibliobox to be an Open Publication Distribution System (OPDS) compatible ebook catalogue interface.
- cateloging EPI that provides direct integration into reading devices
- easy for the operator to add books to the collection
- unix faxed file server (about 2 sq inch grey box; 32mb RAM + other numbers. Basic fact: It’s very small and very impressive)
- provides wireless thru cell phone
- USB drive; when you’re connected to the box, you’re not connected to the internet
- standard format (fancy RSS feed)
- Navigation stream and acquisition stream (contains entries that are actual books)
- [note: the speaker is explaining navigation feeds, but this live blogger is a bit confused. It looks like XML script]
- SQLite (open source, run SQL queries to retrieve configuration data)
- SQL EPIs
- framework for developing web based applications in python
- build the application in pieces and have handlers with different URLs (noted by the “@route” functions in the script)
- @ signs are function “decorators”
- query the database to get an author list; passed on to the catrender
- Note: there is a great YouTube video that makes a compelling case for Bottle. link to come.
- used to make templates in python
- takes advantage of pythons dynamic code generation for content
- % is way of mako introducing python code
- Adding books is much more complicated than serving the books
- Epub file a collection of metadata files, CSS, HTML, etc. all bundled together and zipped
- ideally: feed the Epub file to the program to read the metadata
- will have to write his own code because what exists is not good enough
- [showing screenshots of the navigation feed]
- Why do this?
- The idea of having a local, non-networked, non-traceable device, that is stand alone
- Fits in a box, runs for 15 hours, not connected to infrastructure
- Provides a way to share ebooks without sharing entire libraries
Q: Do you have problems optimizing memory? I had crashes over and over when doing this
A: Everything was small. So I used a small web server, python, SQLite. It’s about core technologies
Question Answering, Serendipity, and the Research Process of Scholars in the Humanities
Kim Martin, Victoria L. Rubin and Anabel Quan-Haase, Faculty of Information and Media Studies, University of Western Ontario
- Research on Humanists and their information seeking patterns
- Historical research progress. Stages 1-5 Prelim idea; literature search & hypothesis; data collection; analyzing and interpretation of data; present findings
- Serendipity: the faculty of making happy and unexpected discoveries by accident. Also, the fact or an instance of such a discovery (OED)
- Investigation of information encountering in the controlled research environment (Erdelez, 2004)
- study was in a controlled environment (downside)
- Facets of Serendipity in Everyday Chance Encounters (Rubin, Burkell & Quan-Haase, 2011)
- facet a: prepared mind (prior concern + previous experience)
- act of noticing (observations; attention)
- chance (accidental nature; perceived lack of control)
- Coming Across Information Serendipitously: Part 1 - A Process Model (Markri & Blandford, 2012)
- libraries are where these discoveries happen
- Why does Serendipity matter?
- can lead to discovery
- thinking outside the box
- trait of creative search
- original thinking
- link to distractions
- Do some tools encourage Serendipity?
- “Planned Chaos” (Hoeflich, 2007)
- Serendipity and the (Digital?) Library
- digital public library of America
- Evergreen (Leddy Librarian)
- physical library does not need to be mirrored, or even represented, online
- produces unique query based rep of full text dig lib collections
- provide a capability to automatically extract answers relative to users’ specific indepth interest
- allow users to personalize system
- encourage browsing
- natural language process and question answering
- QA systems has to determine two things: what type of informatio it is looking for, and where to look for the answers
- there are 3 main modules of QA
- question answering
- document answering
- answer extraction and formulation
- input is parsed
- e.g. When was the Magna Carta signed
- Question semantic form: When (EA type) + signed + Magna Carta
- add-on for virtual library catalogs that integrates a users find
Bohemian Bookshelf: doesn’t work with text
Q: Regarding serendipity, there are 2 different avenues for that, what is more fruitful? randomness or relationships/
randomness is not connected beforehand. Usually there is a relationship first. Go in search of something and then something
Q: Degrees of separation? What if we could bring things that are 2 or more degrees of separation?
A: are they related or random? hovers somewhere between?
Q: How do you prevent serendipity from becoming Clippy (the MS paperclip...)
A: [live blogger lost internet connection and unfortunately forgot the answer]
Q: Serendipity and Openness?
A: The way we search now illustrates the problems of serendipity and tracking. Searching the web, via Google, often leads us to these serendipitous moments of information retrieval