On adding visual resources to BHL

Doc link: http://bit.do/BHLvisualresources 

BHL is considering adding scientifically relevant visual resources to its collection. The addition of visual resources broadens the understanding of the concept of literature in the BHL’s mission... 

“improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.”

...to a broader interpretation. Literature is generally defined as written works.[1] On the other hand, it is often said, that a picture is worth a thousand words.[2]

A number of questions have been raised by the Collections Committee and Technical Team. Fundamental to these questions, are the following key factors:

Working “Visual Resource” Definition

Visual resources need to be defined in terms of a rubric that can help with future decision making. Visual resources in the context of the BHL collection should be defined as:

Discussion Points

Points in favor

Points to address

There are many examples of visual resources, such as botanical and zoological  illustrations, that are relevant to BHL’s core subject-matter scope as well as its primary audience.[5]

“[Art] doesn’t serve the research needs of BHL’s primary audience, and there’s not enough staff, funds, etc. to expand beyond the scientific literature yet. BHL defines itself in terms of the literature… Stick to the main mission until it’s accomplished. Upgrade, improve, enhance, yes, as with copy-specific info, but don’t divert the focus when there’s still so much to be done.” - Leslie Overstreet

Many BHL consortium partners hold visual resources in their physical collections alongside the books, journals, and gray literature they have already digitized for the online collection.

It would be useful to ask BHL users whether or not visual resources in the collection would be useful in improving their research methodology and to what degree of priority they place on including these resources. “What information do our users (writ broadly) want, and where can we provide this information that will maximize its reach?” - Keri Thompson

The potential to expand access to these materials alongside the collection’s publications provides an opportunity to grow the BHL program in new ways.

“BHL was built for books.” Incorporating non-book materials into the collection will require resources to implement changes. Many are doubtful that BHL has the resources, at this time, to address the technical, metadata, and digitization workflow developments required to resolve how visual resources will be, “ingested (stored), indexed (search), presented (user interface), exposed (APIs, exports), and reported on (admin site, statistics),” and etc. - Mike Lichtenberg

Some see scientifically relevant visual images (particularly related to taxonomy)  as part of a broader understanding of the term “literature”.

Others have expressed concerns that the addition of visual resources changes the mission of the BHL and this change will move work away from the mission.  

In order to deliver services based on this expanded role, BHL must determine how to execute the integration of stand-alone visual resources as a new content type within its collection. BHL’s current infrastructure and design revolves around digitized books (or “book-like items”). Any digital content that does not fit within the “book-like” model must either be configured to fit or the model must change. The prospect of developing BHL’s model provides opportunities to pursue a range of new funding sources to support the technical and collection development needs of incorporating additional visual resources.

The purpose of this document is to summarize the discussions of the BHL Collections Committee regarding 5 possible options for how to handle visual resources. In addition the document outlines a series of issues that would need to be considered in order to fully realize options 1-4.

Table of Contents

Discussion Points

Options for Handling Visual Resources

Appendix I: Questions to Address

Collections

Metadata

Digitization Workflow

Technical Development

Usability

Appendix II: Mockup for Option #1

Appendix III: Feedback

Options for Handling Visual Resources

The BHL Collections Committee has come up with the following 5 options regarding “visual resources” as a potential new content type in the collection. Each option includes a sample file or mock-up, as well as pros and cons.

Regarding “visual resources,” BHL could...

  1. ...reconfigure its “book-like” item model to include visual resources as a new content type

Sample

  • [provide mock up of possible UI]

Pros

  • Would allow BHL to accurately store and present visual resources within its collection in the most user-friendly manner
  • Metadata specifications for visual resources could be modeled around best practices for existing schemas in art community, esp. via BHL’s Art of Life work
  • Need to consider existing schema(s) that fit within Internet Archive data model
  • S Lynch: "I like the idea of providing a gallery of images where clicking on an image would open the BHL bib or item record.” screen.png
  • S Lynch: “What if clicking on an ‘Images’ tab in BHL brought up a gallery of thumbnail images and clicking on one of those images opened the bib or item record?"

Cons

  • Anticipated Technical development time commitment would be significant
  • Diffuses already limited technical and collection development resources or requires the need to pursue additional resources

  1. ...maintain its “book-like” item model and configure visual resources to fit within this

Sample

  • BHL Singapore: http://beta.biodiversitylibrary.org/bibliography/61095
  • AMNH: http://beta.biodiversitylibrary.org/bibliography/61096    

Pros

  • Technical Team time commitment would not be as large as required for option 1, but would still require time
  • Metadata specifications for visual resources could be modeled around BHL’s existing requirements
  • Details view shows more detailed information about the content

Cons

  • visual resources will be treated as books: visual resource titles will act as book titles, artists will be treated as authors, as M Lichtenberg describes, “They will be ingested (stored), indexed (search), presented (UI), exposed (APIs, exports), and reported on (admin site, statistics) as if they were books.”
  • No way to differentiate between books and visual resources in BHL backend database as well as user interface; likely to cause confusion for users; likely to pose significant impact on metadata export functionality such as DOI creation, OAI feed to DPLA, as well as those planned for discovery layer tools
  • Scientific names would need to be manually entered since the likelihood of OCR text being created is very (very) low ⇒ current status: manual scientific name data entry process has suspended until further development possible
  • Further illuminates lack of name authority control.
  • S Lynch “Genre: Book is  ugly.  This is currently determined by byte 07 in the leader.  I don’t see any options that would work better for visual resource.”
  • Selected MARC 5XXs displayed via UI does not capture all notes entered for records
  • S Lynch: “Since all of the interesting metadata is in the notes fields, I’d love to see a default view of ‘Details’ instead of ‘Summary’ for visual resources.”

  1. ...incorporate visual resources into another existing repository (e.g. flickr, EOL, etc.) and create linkages within BHL out to this other repository

Sample

Pros

  • Low impact on Technical Team as link out model already exists for “third-party” link outs
  • BHL data model does not need to be altered.
  • T Rose-Sandler “[By putting visual resources] into outside repositories - we can take advantage of the much larger crowds who can help us describe the images and our images get exposed much more widely than just siloed in BHL. Another option is we could have them in external repositories and have that metadata brought back into BHL for searching”

Cons

  • May lessen “power” and strength of the BHL corpus by banishing legitimate content to an outside address.
  • Since BHL would not be managing the content directly, we could be limited in our ability to re-use/re-purpose the visual resources to our benefit.

  1. ...build a new repository specifically to host BHL visual resources

Sample

Pros

  • S Lynch: "I like the idea of providing a gallery of images where clicking on an image would open the BHL bib or item record.

Cons

  • Separate repository creates additional silo of content; BHL website would not be one-stop-shop
  • Resources required to build and maintain a whole other repository; experience with CiteBank proved that maintaining a new repository alongside BHL is not sustainable without dedicated resources
  • Direct linking required between new repository and existing BHL collection without which would make the new repository inadequate

  1. ...decide against the inclusion of non-book-like visual resources and continue to focus on literature and manuscript materials already being included in the collection

Pros

  • Maintains focus on BHL’s core business model.
  • Maximizes technical and collection development resources on core content

Cons

  • Disappointing to BHL partners with relevant visual resources in their collections that may lack an alternative repository in which to host them.
  • Diminishes opportunity to provide public access to visual resource materials held in BHL partner collections
  • Limits audience and reach of BHL. Visual resources have a huge appeal to citizen science and non-science fields and we would limit their ability to discover our treasure trove of visual resources with a strict focus on literature


Appendix I: Questions to Address

Collections

  1. How much content would we be working with?
  1. AMNH has a large collection of visual resources with scientific names which are cataloged similarly to books, could possibly fit into existing BHL structure
  2. NYBG has a botanical art database with thousands (maybe tens of thousands) of records, digitization is under consideration pending grant funding (?)
  3. BHL Singapore has botanical visual resources that they would like to contribute
  4. MCZ has “at least 1200 digitized images and these are the basis of some Amazonian ichthyological taxonomy”; DPLA has ingested these images.
  5. Museum Victoria
  1. What is the level of interest from BHL partners and users?
  2. Is BHL the best place to put visual resources?
  3. What is the priority of including visual resources in the BHL collection?
  1. What is the priority of including visual resources compared to other content formats such as gray literature, maps, &etc.?
  1. How would we navigate copyright issues around visual resources in order to display via open access?
  2. What do we mean about visual resources? Is it 2D or sculpture or realia, material culture artifacts (like handmade baskets, clay pots), photographs, or works created by hand, & etc.?
  1. maybe some photos are part of field notes…what’s the difference?

Metadata

  1. What are the basic metadata elements we would want to capture about visual resources? provenance, creator, medium, & etc…
  1. reach out to existing organizations with huge collections of images about minimum requirements
  2. how do different portals present image data? alongside book content or separate?
  1. What are the existing visual resource metadata schemas we could use outright or build off of?
  1. VRA core (surrogates of visual resource, the work vs. the image of the work), CDWA (categories for descriptions of works of art, helps track history of conservation for ex.), LIDO - RDF direction more traction globally, Art of Life (based on VRA core + Darwin core),
  1. Should we be concerned about Int’l Image Interoperability Framework...standard way of referencing images from a single source for a variety of services; maybe protocol, encoding…?[6] 
  2.  Is there any place that we are planning to deposit BHL metadata that having records for visual resources could cause a potential issue? (e.g. DPLA, OCLC, other discovery layer products)

Digitization Workflow

  1. How do we go about digitizing visual resources?
  2. What should the quality of the visual resource digital surrogates be? Highest quality digitization may be required to ensure the greatest ROI for the value of these resources and the work needed to make these resources part of the BHL collection.
  3. Would we need to worry about managing color standards during digitization? 24bit color & color strips for accurate imaging  

Technical Development

  1. How would we manage storage of visual resources digital files?
  1. Would we store them in IA?
  2. If so, how would we store them?
  1. How would images be uploaded to IA? Likely via Macaw or IA’s Tabletop Scribe.
  2. How do we get the visual resources into BHL (via IA most likely, see existing image collections https://archive.org/details/image)? What do we need to do to import the files into BHL?
  3. How would we need to configure the backend database to accommodate visual resources?
  4. Would we want to provide APIs for visual resources? exports?

Usability

  1. How do we want the visual resources to display in BHL?
  1. For visual resources already published in existing digitized books would we want to extract them as part of “visual resources” display/format?
  1. How do we want the visual resources to be discoverable in BHL?
  1. Would images be cross referenced alongside literature search?
  2. Would you want to be able to search across book plates and also pull visual resources?
  1. What download features would we want to provide?
  2. Would we want to connect the visual resources to existing publications if linkable?
  3. How would we implement scientific name finding features &etc.?
  4. How would we manage questions about reuse of visual resources by users?


Appendix II: Mockup for Option #1

Reconfigure its “book-like” item model to include visual resources as a new content type. These mock up images are selected examples of possible changes to the BHL UI. Many more changes will need to be made than are pictured.

Homepage changesVRhmpg.png

Initial search results changesVRsearchrslts.png

Visual Resources tab search resultsVRsearchrslts2.png

Item viewer changes

VRviewer.png


Appendix III: Feedback

Technical Advisors: Option #1 provides the most viable method for moving forward. At this time, the Technical Team cannot commit to handling standalone visual resources in BHL in an effective manner that could scale.

Regarding option #2, shoehorning new content types into BHL’s book-like model is not ideal and could be confusing for users. The long term implication of this option is that, at some point in the future, we would need to rethink the data model and user interface to provide access to visual resources in a way that treats them separate from book-like items. We would need a way to transition the visual resources to a new model automatically. Without the ability to automate, we would have added a lot of content that will require manual reworking to make it fit into the new data model once it’s ready.

Option #3, requires the development of at least two different and major components from the ground up – storage of metadata, and storage of content. BHL’s current data model addresses the storage of bibliographic metadata and Internet Archive addresses BHL’s content storage needs.

For option #4, we could set up a way to implement one search across multiple data models such as the way in which current cultural/natural heritage aggregators work, e.g.: SI Collections Search, Harvard’s Hollis+, DPLA, and Europeana. These aggregators bring together various types of digitized materials from various collections under a single platform. Current models show however, that in order harmonize various content types within a single platform, there may be a need to “water down” the information presented or services provided. BHL version 2 could be something more along the lines of a Europeana data model, integrating various types of biodiversity relevant content under a single user interface.

Option #5 is a policy decision and not within the Technical Team’s purview to comment.

In conclusion, we prefer to include the incorporation of visual resources in the requirements for BHL version 2.

Keri Thompson: “I’m not going to tell you anything that you haven’t already thought of, but the issue of art works (or, let's say, anything that isn't literature like type specimens, science vlogs etc.) raises the following questions for me:

  1. Is BHL primarily a mechanism for disseminating member collections, or is it a mechanism for disseminating information related to studies in biodiversity and Natural History?
  2. What information do our users (writ broadly) want, and where can we provide this information that will maximize its reach?
  3. What is BHL's vision for it's content in 10 years, and what are the current technical development priorities that will enable realizing this vision?

I feel like if those questions can be answered to both EC and the collections committee's satisfaction, a technical solution can be found for the specifics.

My personal opinion … is that art shouldn't be a short term priority unless our users are clamoring for it. Until a new, all inclusive flexible BHL exists, art can live on a separate platform in a place where people who want art will find it as easily as people who want species descriptions from books.”

Don Wheeler: “The discussion about a difference between art and photography, or between a hand drawn illustration and a photograph, blurs the main issue.

 

A drawn illustration and a photograph accomplish a very similar, if not the same, goal: to give a visual representation of information.

A picture can display information rapidly and comprehensively in a way that the written word cannot.

 

It is the CONTENT of the picture (the illustration or photograph) that is the focus of a question of relevance, in the very same way that the CONTENT of a book or journal becomes the focus of our mission.

 

We do not include science fiction, for example, or examples of surrealistic poetry, yet these are found in books and journals.

 

It doesn't matter what form the information takes (book, journal, drawing, photograph), it is the information itself that is the reason for our program: to inspire discovery through free access to biodiversity knowledge.

 

For the question of the BHL website: the same logic holds. The website is the method of providing the information, it is not the information itself, as a book is a method (a series of printed pages bound in sequence) of providing information, and not the information itself. The website is designed to provide the information taken from a written form (book or journal), based on a metadata structure (MARC) that is limited and out-dated. If the information is available in a form that cannot be defined well in that metadata structure (MARC), for example a photograph or drawing, then BHL is limited in its ability to provide that information. Limited because the form of access to the content (via metadata) is designed to work with 'book-like objects' (defined by metadata).

 

This question of adding visual resources as content in BHL is rooted in a deeper issue: the additional time, personnel, and money that would be required. It is not the content that is at issue, it is not the form the content takes, (because the form is not less valuable or important: think of a microphotograph that illustrates the structure of pollen grains found in archeological remains and which shed light on the age and distribution of a species). The issue is that delivering the information in any other form than a 'book-like object' requires modification or re-build of the method of presenting the content. It is the existing BHL data model AND the existing staffing, funding and technical model that is at issue.

 

We have a limited capacity to present information available as images. The offer to present images separately from books and journals highlights that limit. We will grow in bulk by adding more books and journals; do we grow in capacity by re-tooling for service? Can we do both?


[1] http://www.merriam-webster.com/dictionary/literature “written works (such as poems, plays, and novels) that are considered to be very good and to have lasting importance : books, articles, etc., about a particular subject…”

[2] http://freakonomics.com/2011/07/14/a-pictures-worth-a-thousand-words/ “The drawing shows me at a glance what would be spread over ten pages in a book.” Ivan Turgenev, Fathers and Sons (1862) (translation by Constance Garnett); This proverb has long been credited to Frederick Barnard, who used a “look” version in Printer’s Ink, Dec. 8, 1921, and a “picture” version in the same periodical, Mar. 10, 1927.

[3] http://collections.museumvictoria.com.au/items/1087785 example of Museum Victoria visual resource with direct link to BHL. Could BHL provide a form of reciprocal link via the scientific name or bibliographic metadata?

[4] “It is the CONTENT of the picture (the illustration or photograph) that is the focus of a question of relevance, in the very same way that the CONTENT of a book or journal becomes the focus of our mission. We do not include science fiction, for example, or examples of surrealistic poetry, yet these are found in books and journals. It doesn't matter what form the information takes (book, journal, drawing, photograph), it is the information itself that is the reason for our program: to inspire discovery through free access to biodiversity knowledge.” - Don Wheeler

[5] BHL Outreach and Communication Plan: Audiences “BHL audience types: Scientists/Researchers, Citizen Scientists/People Interested in Biodiversity, Artists, Bibliophiles, Librarians, Taxonomists, Techies, Educators, Historians”

[6] from an IIIF conference website: "IIIF provides an open framework for organizations to publish their image-based resources, to be viewed, cited, annotated, and more by any compatible image-viewing application."