Table of Contents
From ILS to BHL: The Journey of BHL Metadata
Review and Prioritize Items on Tech Wishlist
Operational Planning around BHL services
In-Kind & Cash Contributions Evaluation
DAY 1 | |
Strategic Plan Review | |
| Bianca |
| Bianca & Martin |
| Bianca & Coll. Committee |
| Bianca & Coll. Committee |
| Martin |
| Martin |
Collection Management | |
| Bianca & Grace |
Tech Infrastructure | |
| Martin |
| Bianca, Martin & Tech Team |
| Bianca & Tech Team |
| Martin |
TRS grant-funded projects list | |
| Martin |
| Carolyn |
Tech Wishlist | |
| Bianca |
| Martin |
| ALL BHL Staff |
| Tech Team + TAG |
| Bianca |
| Carolyn |
| Mike L. |
| Martin & Tech Team + TAG |
| Discover Tools WG |
| ALL BHL Staff |
| TAG + Bianca |
| Bianca |
DAY 2 | |
In-Kind & Cash Contributions | |
| Carolyn |
| Carolyn |
BHL Statistics Review | |
| Carolyn |
| Carolyn |
| Joel |
| Mike L. |
| Carolyn |
| Carolyn |
User Feedback Management | |
| Martin, Jackie & Bianca |
| Bianca, Matt, Diana D., Alison, Mai, JJ, Jackie, David?, …? |
| User Feedback Mgmt working group |
| Bianca |
Training Sessions | |
| Tech Team + Joel & Susan |
| Joel & Susan L. |
| TAs + Bianca |
| Bianca + Alison |
| Bianca + Alison |
| Bianca |
| Mike & TAs |
| Bianca |
| Jackie/Bianca |
| Gemini committee |
| Martin |
| Martin, Mike L., & Joel |
Outreach | |
| ALL BHL Staff |
Grace | |
| ALL BHL Staff |
| Grace |
| ALL BHL Staff to tell Grace |
| ALL BHL Staff to tell Grace |
Documentation | |
| Bianca |
Martin Kalfatovic
Presentation slides:
http://www.slideshare.net/Kalfatovic/looking-back-looking-forward-the-biodiversity-heritage-library
Darwin stressed importance of having access to biodiversity literature.
Vision: “Inspiring discovery through free access to biodiversity knowledge” - key goal in BHL.
In 2004 at IMLS webwise in Chicago meeting a group of people met and sketched out BHL on a cocktail napkin.
In 2005 in London, kick-off meeting where a number of different partners (original members) were at meeting in London where they brought together 80 scientists from around the world to tell us what they needed in terms of biodiversity information and literature. They told us they needed web access to literature.
In 2006 we had our first kick-off meeting at Suzanne’s house to do ice-breaker with number of BHL people, which was followed by a business meeting at a conference room in SIL.
2007 official launch of BHL portal in conjunction with EOL launch - firs tpublic launch of the work primarily MBG did to start building BHL from Botanicus.
2012 first large staff only meeting. Smaller group then.
Present day - we won a Hero Award from the Internet Archive in 2015.
BHL is a starfish in the regenerative sense. What regenerates over the years? Our staff back to 2006 when we first started the project, but new people come on over the years.
Finances of BHL - funding has changed dramatically over years. No longer dependent on MacArthur grant which originally funded us. By contributions of Members and other participants we have fairly sustainable model. Meetings are very important, whether virtual or face-to-face. Leadership has also changed over the years. Our initial exectuive committee was Graham Higley (NHM), Cathy Norton (MBL), and Connie Rinaldo (MCZ). Then on to Nancy Gwinn (SIL), Connie (MCZ), Susan Fraser (NYBG). By Jane Smith (NHM) is Secretary.
Content has also grown over the years. We launched with about 200,000 pages, but we now have about 48 million pages.
Strategic Plan: Top level goals:
BHL is now getting to the size of lots of our libraries in terms of actual volume content.
People and Money:
People: Secretariat
3.5 FTEs - Martin is only part-time director of BHL; part-time also SIL. Carolyn manages day-to-day finances. Bianca manages collections and communication with staff. Grace deals with outreach. All Secretariat funded by SIL money. Smithsonian is partly-funded by government funds. We have private and government money. Martin is permanent Federal staff. Carolyn is also federally funded by allocation from Senate to support biodiversity at smithsonian which covers Carolyn as permanent employee. Grace and Bianca funded through Smithsonian private money with about 3 more years of committed money.
Technical Team as MBG: William Ulate (technical director), Mike lichtenberg (programer), Trish Rose-Sandler (data analyst). all staff funded by soft money from beginning. william’s salary BHL dues and MBG grants. Mike’s salary paid by BHL dues. Trish on grant money for five years.
Not enough funding in core budget to pay for William’s salary after end of calendar year. So we will not have technical director after December.
2015 Budget Expenses: $688.5K for operations of central BHL activities. About 80% salaries. Then digitization, meeting costs, travel. There is additional digitization money from Smithsonian not included in these figures.
About 50% of budget comes from Smithsonian. MBG grants account for about 27%, then donations and Dues.
BHL now has 16 members. FMNH (original member, was affiliate, and now member again) and CONABIO most recent member.
2 levels of membership in BHL: first level is full dues paying members with annual fee of $10,000. Fees go into central core fund which supports technical group and digitization. Then affiliate members where structure just changed. Affiliates pay annual dues of $1,000 but don’t get direct access to services of technical and secretariat staff. also additional work fees (like if you need training) the affiliates have to pay. Global partners sort of on their own with working with BHL. Some global partners also choose to be members or affiliate (Mexico and Singapore). Some partners are affiliates (Africa).
William: are affiliates forever? No. When you become a BHL Affiliate it’s a 3-year term that mutually renewable. Those institutions who were previous affiliates are grandfathered in without the $1,000 dues for their 3-year term. After that term ends, then the dues must be paid by current affiliates.
Suzanne: What is IA? They are currently a partner but they have applied for affiliate status. We’re still waiting on the paperwork but they will be an affiliate by end of calendar year.
Daria: Do you take institutions outside of libraries/museums? Yes, in the by-laws we outline types of institutions that can be members/partners with BHL. institutions are specifically outlined. GBIF is an international organization likely candidate to be associated with BHL. EOL is actually in by-laws at partner. We would like to look at other institutions including commercial institutions. By-laws don’t specify that you have to be a non-profit.
Connie Rinaldo is our global coordinator. We have been looking for over a year to find an institution in Canada to be a seed group - working with Canada Museum of Nature in Ottawa. Also interested in McGill for Canada.
We’re also interested in partners in India but starting to gather possible interested parties. Tomoko can spearhead Japan growth. Other parts of South America are not represented. We need leads for contacts there.
William: what should be our growth strategy? keep targeting countries or look for current groups. Mexican model works well. We have a strong pan-national organization that can bring other organizations in. Is content most important thing we can get from partners? Not necessarily. Lots of core literature already represented. We need to target gray literature and take advantage of their promotional activities.
Trish: We’re continuing to grow but the staffing is shrinking. What about the burden on staff? Affiliate fees initial experiment to see if that will satisfy the extra burden of new staff.
Marty Schlabach and Martin Kalfatovic
View our Strategic Plan http://biodivlib.wikispaces.com/About#Strategic Plan
Not a line-by-line review of the plan but a chance to review the plan that has been approved by the members. There is a regular Members meeting where a single representative from the members attends. This was approved in July of this year. This is the document to guide us through 2017.
Documented divided into parts. Vision and Mission. Value statements. and Goals with more specific things for us to strive for. Each goal has initiatives and tasks - the tasks are the actual things we hope to accomplish as part of the plan.
Take a look at the plan and identify areas that particularly relate to your level or role. What are your thoughts and are there gaps or areas of concern? Goals of meeting - think about these tasks and activities and see where you fit in, where you can suggest new activities, and things not achievable. We can bring that back to the members.
Matt: What is BHL Version 2?
Martin: Right now BHL is version 1.5. We had refresh of user interface with help of BHL Australia a few years ago, but we’ve been talking about how we would build BHL if we started today. We want to bring staff together to talk about needs and tasks for a BHL version 2.0. One of the things I want from the staff is an idea of what we want for BHL 2 with goal of seeing if we have resources to build that sometime in 2017. Right now we’re in the information-gathering phase.
Martin: One of our questions is “what is the scope of biodiversity?” What is actually out there and how much is there still to do? We need to do a collections analysis, but what does that look like and how do we do it? Is there assistance and places for the Collection’s Committee to help? We’ve also recently had a debate about whether artworks should be included in BHL.
Chris: Do we know how much public domain content we’ve digitized to date.
Martin: We’re trying to do that at Smithsonian, but we need to do a larger extent. Can we do this in a coordinated way, sooner than later?
Chris: Our institution has been looking for ways to contribute without scanning. Maybe this is a way.
Tomoko: what about the border between biodiversity and medicine? How far there do we go?
Martin: The question of agriculture has also been a question, particularly from some of our users.
Marty: Many new agriculture and horticulture information from seed and nursery catalogs from cornell, nybg, and harvard.
Carolyn: Do we need to review the collection development policy to determine a way to meet this goal?
Suzanne: Do you think it would be appropriate for people to do an analysis of their own collection to determine what is in public domain? To do that, we need to determine what is in scope?
Chris: Our librarians want BHL to tell us what BHL wants scanned and they will scan it.
Marty: First we need to define our collection paraments, then do collection analysis to decide what we do and don’t have. At Cornell we determine public domain different for international publications than most institutions do (they follow copyright rules of country that published material). So topic of public domain also needs to be determined. Is it the collections committee or larger staff that needs to address this?
Jackie: First point sounds like central BHL is supposed to guide us in what we’re supposed to be digitizing. Is the Secretariat supposed to provide this guidance?
Martin: This first point is really speaking more to optimizing use of pan-BHL scanning funds. We need BHL to have a clear focus for what we’re going to scan within our scope - a clear statement of what BHL wants to be in the collection.
Carolyn: Let’s talk about the steps. Do we want a broader view of the world of biodiversity literature, even beyond what we already hold?
Marty: Isn’t that part of the new IMLS NLG?
Martin: Yes - new grant is to increase content from not current members or affiliates in BHL. So we need to determine what we want before we go to target those institutions?
Michael: Is there any effort to do a coordinated effort on what’s uniquely held by members?
Martin: In 2006-07, we received small grant from lounsbery to do small collection analysis - we dumped as many records we could to try to see who has unique content. We had a lot of problems with different ways of metadata. This makes it hard to do a coordinated approach. SIL receives an OCLC report of our collections vs hathitrust digitization. But many BHL institutions not members and don’t already get that report.
Tomoko: LoC will not digitize if it’s in HathiTrust.
Marty: Cornell does that as well. If it’s already in HathiTrust they will not redigitize unless there’s some qualitative reason.
Martin: User Engagement goal. How do people feel about how we’re working with our users? Do these goals and tasks address what we should be doing with our user community?
Keri: At no point do we talk about user services. Where do we put that? Is it a thing? Helping users, like reference, ILL, etc. There’s a difference between giving a service to a user and engaging with a user.
Action Item: DISCUSS FURTHER WITH MEMBERS IF WE WANT USER SERVICES IN STRATEGIC PLAN (ABOVE)
Martin: Goal four - how can an organization be the right size and still address all of our needs. Trish asked how big can BHL grow and still have the resources to support that? Are we going to be reliant on small Secretariat and Technical team to do this? Are other institutions able to come in to support these activities?
Trish: Have there been discussions about, when we take on new members, requiring certain minimum staff time dedicated to BHL (not just funds)?
Suzanne: I think we need to have requirements about the time committed to BHL from members. If we can codify in some way to get hands and feet on the ground, that will help.
Action Item: DISCUSS FURTHER WITH MEMBERS ABOUT INCLUDING TIME REQUIREMENTS FOR NEW MEMBERS (ABOVE) - INCLUDING IN MOU
Carolyn: How do we ensure that this happens?
Suzanne: That’s what the members need to decide. I think the members are usually our supervisors so they need to commit to it so we are given time and hands to do it. I don’t think we can say how to ensure it - this needs to come from the management.
Marty: Annual in kind report does at least report that.
Connie: We have been trying to make sure new members and affiliates have to continue to curate material once in BHL.
Jackie: As we’re talking about BHL growing, we also need to talk about what we want the secretariat to do vs. what we can do as staff with committees. We can use the secretariat’s time most effectively, so we should have a discussion about their time.
Bob Corrigan - Director of the Encyclopedia of Life
Presentation was verbal, no slides
BHL is literature component of EOL. Over past year we migrated entire EOL data to server at SIL. We have decided to pul out lots of technology built back in 2008 for harvesting. We had turnover where we wanted to establish a team that was more nimble and position ourselves to collaborate with new partners at Bibliotheca Alexandrina. New library is designing next generation big data core of EOL.
Increasingly we have data and services that allow us to deliver that data and then we have applications. That model is what we’re designing. Next version of EOL (to be announced in fall 2017) big data core then services tuned to needs of users. Early on we tried to design one website to build them all. That’s not scalable. In future you’ll have big data core and then variety of services that allow you to access that. But this work will take time and energy.
One of the things we’re keenly aware of is that as soon as articles as written, they start to get old. We never yet embraced idea that content we got in past could be updated. We’ll be working on allowing data to come into editing or republishing environment where users can update content to go back into EOL.
Traitbank largest, most successful thing we’ve ever done. We have data about 1.7+ million species. Availability of data outstrips availability of accessible articles. We will try to start to define what we know about species then we can decide what services we can provide. Does now mean we’re backing away from text. It’s an important source of data. You can expect to see us push BHL to front as a source of definitive information in narrative form.
Four years of work allows google to search EOL, find relevant data and display it for search. We don’t care if people come to EOL - we want people to use the EOL API.
Just got back from China. Enthusiasm of colleagues in Asia is high. Clear that they value integrating data in way that’s really powerful. Not just enough to have EOL, or GBIF, or BHL. We need to bring this information together in powerful way, so we want to deepen our practical collaboration with BHL, GBIF, etc. Collectively we can achieve this, but we don’t have resources alone. EOL will be focusing on that in the next few years.
Marty: Suggestions for enhancing BHL that will enhance EOL.
Bob: Yes, and I can share that. One of the things we’re doing now is hitting the BHL API to see what we can get from BHL and what we can use. We need to decouple data from services, so we need to think architectually.
Bianca: Should we bring artworks into BHL? Do you have artworks?
Bob: Yes, in our next version we’ll do a better job at tagging our images so it will be easier for users to find visual resources that users need. Our team needs requirements and freedom to do them. So think big.
Suzanne: Do you want geology?
Bob: There has been debate in EOL that we cover things that used to be alive, so we are getting fossils into EOL. We are working on providing new home for orphaned content that has lots of environmental data. Do we want rocks? Not sure.
Bianca Crowley
As Collections Coordinator, I’m involved in both collections and coordinating the activities of our staff to digitize.
Martin: It’s the beginning of our new performance year, so that’s also an opportunity for you to tell us what Bianca should do.
Tasks that Bianca does: collection management and acquisition; liaise with technical team; birds eye view of digitization workflow and metadata; Gemini consultant; copyright and permissions (handles in-copyright permissions that we digitize for BHL); user account management for our tools for BHL; documentation (started doing more and more of this).
BHL is about books and biodiversity. Especially the scientific names contained within those books. Many other applications link into BHL because of those names. Books in your collection with high taxonomic value are perfect for BHL. Use your discretion on what to scan.
Collection management is continuous cycle where digitization is one of five steps. We also need to focus on activities after digitization as well. Mass digitization approach means we rely on users to give us feedback about where we have curation needs. Without user feedback we couldn’t address all of these collections or cataloging needs we have.
Our consortial organization is about trusting our partners to decide what’s relevant to biodiversity and digitize that. We encourage unique materiasl from your collection and institutional publications that can be made available via open access. We also have a list in Gemini of potential books to digitize. Maybe Bianca can figure out a way to create a workspace than can generate lists for people to find things to digitize from Gemini.
Much of our selection process is focused on gap-filling journal runs.
Deduplication is about comparing materials with existing and in-progress collection. There’s a lag - we’ don’t know everything that’s in progress until we see it in BHL. Another reason why Gemini’s important. If we duplicate it’s okay. Sometimes duplication is deliberate. Best suggestion: use your best judgment and ask: is it already available in BHL? If not, is the digitization already in progress? To answer, search BHL. If it’s not there, search for the item in Gemini.
Deduplication in Gemini: who thinks process of deduplicating via Gemini is working well? 4 people said yes.
Working okay: 6 people
Not working: no one raised hands.
How many of you use of the monographic deduper on a regular basis? 4 people
Use as needed or occasionally: 1
haven’t used in a long time: 4 people
(Bianca uploads spreadsheets from all titles that we ingest from non-BHL partners)
We are no longer supporting the monographic de-duper. we can’t add new institutions to the de-duper, so it’s not a viable tool for us. We really need a place to host and develop the tool. We don’t have that. The deduper tool will be gone no later than Nov. 30.
So the only tool we do have is Gemini. We could try for monographs in there, but we need to think about further.
Digitization: IA is the repository of aggregation. From IA we harvest metadata into BHL but we serve images from IA.
Users: Users are important to furthering our process and understand how we can improve our website.
Final step: Curation. Done via Admin Dashboard.
Participation in BHL means investing resources in all of these steps.
Collection Development Policy: Can help us understand BHL’s policies on specific topics. It also includes a sketch about our scope - it represents our core content and supporting content. At our core we’re really focused on work related to biological and taxonomic scientists.
Our most popular date of publication in BHL is 1912, with biggest spread from 1900-1922.
We accept digitized books and volumes; materials that are book-like (field books, correspondence). Digitization for BHL means cover-to-cover. We image every page from cover to cover for BHL. We also accept articles in terms of metadata, but not digitized article. Started accepting this article metadata in 2012. Articles largely identified by BioStor of Rod Page.
SIL has some articles bound as it’s own book, that we do digitized.
MCZ occasionally does sometimes scan bound articles.
Marty: are you saying is a bound journal is digitized and there’s article metadata associated with all of those parts, that BHL can accept it and make it searchable?
Bianca: Theoretically yes, but we haven’t done that before. It would be a mini-project to figure out how to fit that in. Rod Page through BioStor also does accept lists of citations that he can incorporate into his dataset that we would then harvest later.
Mike: If you wanted to do this today, the best way would be to go through Rod Page. As far as putting in through Macaw, we’ve had discussions about this but it does not exist.
Marty: How much have we benefited from article metadata put in by users? Do we use that?
Bianca: Not yet. It’s on Mike’s list to do but never made to the top of the list. User contributed data is mostly good but not as reliable as BioStor. We’ve been chewing on it for a while.
Susan: Is all article-level metadata in BHL provided by Rod Page?
Bianca: Yes, only 6% of articles in BHL manually entered.
We also accept records that link to content not in BHL only when loading directly into BHL is not possible.
Selection Process: We look for discipline specific and in-copyright items with permission. Ingest materials from IA as well based on criteria from collections committee. Majority of selection now is user driven because users submit requests for gap fills or items not in BHL. These get triaged to libraries to send for digitization. Permission titles to scan are high priority so these should be prioritized by you.
We need our content to be open access - your responsibility to make sure this happens. We also need MARCXML and high image quality. We have documentation for all of this.
Accomplishments for Collections Committee: Refined enumeration and chronology standards; added article abstracts and notes to bHL; analysis of priority zoological serials and they are in Gemini; revived contributor browse option; added note fields from MARC 5XXs to the BHL user interface; automated adding content to seed and nursery catalog collection; submit technical change requests to improve collection browse; updated policies about external links; put together document about things to think about for adding artworks; created cataloging group.
We need to do collections analysis but we lack the resources to do it. CC came up with direction but they need someone to do the work.
We’ve also talked about de-emphasizing links to external content in search results. They might cloud the search results for things in BHL, which will have all of the services BHL provides enabled. What do people think?
Most in favor of de-emphasizing link.
Jackie: I get lots of questions about what the external links are. Would it be possible to add another tab for external links?
Bianca: Right now the question on the table is just about search results lists. But that’s a good suggestion that we can approach. Our committee will continue to discuss this.
About 66% of our collection coming from BHL partners. But we have partner content that’s not actually in the biodiversity collection, so Bianca needs to work to fix that.
We have worked with 171 licensors and have 407 in-copyright titles in BHL but this does not account for institutional publications in BHL.
Suzanne: Collections Committee has thoughts but you haven’t told us what those are. Is there a place where we can see that?
Bianca: Yes, there’s a collections topic there’s a collections committee link where we keep all of our notes. But there’s no one place of just accomplishments or decisions.
Action Item: Incorporate decisions and accomplishments from CC more into quarterly reports.
William Ulate, Mike Lichtenberg, Joel Richard
New BHL Architecture diagram link
Important to distinguish that images of pages are presented from IA and the metadata is in a database.
Whether using Scribe or Macaw, both sets of information can be displayed through the portal. Information including on the item, volume, articles and pages. IA stores this information but it doesn’t necessarily match one-to-one with the way that we deliver, so some finessing is sometimes required (Many thanks to Mike!)
Also important to consider that not all content in IA is ours to modify -- we harvest from institutions that are not necessarily Members or Affiliates.
We always want to think about how we can give all of the information we have and make it available to the users
Sometimes we need to correct that information
In thinking of the future, could our users help with this? Through setup of administrative accounts?
Architecture
Images, metadata, code - we make all the code available
Application server has the BHL services and there is a separate Development server for developing new functionality without affecting the application server. Allows us to test without impacting users.
BHL Database
OCR text files are taken from IA and we have a copy of those. This gives us the option to modify them in the future, say with transcriptions.
Network and Admin Staff -- need people to check systems, run backups, etc
Life & Literature - Requests for future development, we maintain a list for prioritizing based on resources. We had opportunities to apply for funding for certain activities. Improving OCR for example.
Also, for example, providing transcriptions of field notes to override the OCR. What are our considerations? Do we need versioning?
We are now moving servers to BHL.
Joel
Public server and 3 servers behind the scenes doing all the work.
Previously, there was one web server, now we’re moving to two behind a load server. Each one independently able to handle all traffic. Redundancy and speed
Two other servers
One primary server on 24/7, other is a backup.
Still have the application server and the development server that Mike can work on
OCR files that are now served from SI cloud with vast amounts of storage and is redundant, fast, and secure.
Overall, we’re getting more redundancy and performance. Servers have a lot of memory and built to be fast.
Joel and Mike will be testing.
Database servers are in house and being reconfigured for extra performance.
Still aiming for the end of the year for BHL to be switched over but may be in January. We prefer to do this safer rather than faster.
It should look and behave exactly the same.
Application server is serving backend processes including bringing data in from IA into BHL database, PDF creation, processes that push data back out to IA, and a few other odds and ends.
Smithsonian firewalls are very robust and state of the art. We may not even notice if we got a DNS attack.
Network Operation Center - 24/7 365 Double backup power generators, would last two weeks.
Does this help address sustainability? Does this impress fundraisers?
We’ve had very good operations at Missouri. This helps address the scale that we’re now achieving.
Action Item: MRK would like to get together with Bib Alex to discuss opportunities for collaboration perhaps at Tech Discussion station.
Georeferencing? Was based on LCSH geocoding. We have done lookup for it.
Joe deVeer, Project Manager, overseeing contribution of collections to BHL
Overview of how metadata gets from ILS to IA to BHL. There is some new documentation as well.
Three levels of metadata
On the portal, the title level metadata is one left. On right hand pane is the item level metadata.
Item level adds information to include enumeration and chronology and sponsor metadata specific to the volume/item.
Page metadata - Table of contents and page types
How do we get this into IA and ultimately into BHL portal?
3 ways
Partner Meta App - for passing item level metadata to IA
Rules_Scanning - special instructions for IA for a given volume
Also possible-copyright-status
licenseurl
rights
title
creator
Required fields:
search_id, title, volume, creator: author
At Harvard, they use HOLLIS number, IA uses this number to fetch the data from their catalog using Z39.50
though also part of Title level metadata, IA also requires title and author for item level to verify
NOTE
possible-copyright-status MUST be entered just like that, all lower case and with dashes. Otherwise, won’t be recognized the way we need it to be.
Also, note that volume displays publicly but the year does not which is why we include enum and chron in Volume field in Partner Meta App.
We adhere to the ANSI/NISO Z39.71-2006
On BHL, is there a place to view who donated the information? A lot of that is local information that wouldn’t necessarily be included. The problem with taking local information from that MARC so others may add volumes. So at the title level, that kind of information is not provided. Subfield E was stripped out because was being displayed as author rather than donor. If we could figure out a good place to put copy-specific information, we could look into that.
Bookplate image would be visible, but not searchable.
Action Item: Discuss Further: Is donor another thing we can add to display in BHL alongside other “ownership” fields that came up at BHL Africa workshop.
Some folks at BHL Africa workshop indicated interest that sometimes library that scans is not the same as the library that holds the item in their collection.
Does not matter what order they’re in on the partner meta app
File naming convention - collector contributor date sponsor - see wiki for Details!
At Harvard, from database can run spreadsheet, run a report on spreadsheet for everything ready to ship and get everything needed for partner meta app
MARC XML - options for creating
MARCEdit
Export from OCLC with Connexion Client
Pagination tool via Macaw - very nice interface! And updated documentation
Item-level metadata and minimal page-level
Can’t submit MARC XML
Submit Partner Meta App file to IA
Then they send it back to Harvard
and then can use those identifiers to plug in IA identifier and fills in preloaded metadata
William Ulate, Martin Kalfatovic
What are we working on right now?
TAG: Technical Advisory Group - consult on different topics related to BHL development, with different success rates.
The Life and Literature Conference helped us determine priorities for technical advancement. Some of these things are still on the table (like bhl in a box). Some are complete (like games with a purpose).
From these we developed project, groups, and initiatives that added to a list of activities that were prioritized for the technical team. Such as:
Art of Life: This project is now closed (in April 2015). Interesting spin-off of the project was citizen science, which led to our Science Gossip project. Involves describing images from 19th century journals. This is in partnership with Zooniverse. Much of this work is automated on our end - their staff manage most of it.
Purposeful Gaming
Goals to explore ways to improve the illegible OCR. Designed two games with Tiltfactor- Smorball (competition!) and Beanstalk (more relaxing!)
How does the results of the game feed back into BHL? They don’t yet
Smorball won a prize at the Boston Festival of Indie Games
Purpose was also to get results and incorporate them back into the texts. We need to review the differences and play the differences. So far the number of pages completed is not as high as we would like but the scripts are ready to get the old OCR and put the new words and produce a new OCR file. How would we like to handle? Do we update in BHL? Do we leave the old one as is in IA? Are there any changes we can reproduce across the whole corpus?
Mining Biodiversity
Asking for a no cost extension. With Canadian and British institutions.
Mixed degrees of success. Ngram has started introducing errors.
Development of a social media platform - Grace’s work with partners in Canada. Who has blogged about a page and, if all goes well, comments directly on a page.
We can get groups or committees to decide how we’d like to proceed. This required time and understanding of what is required to implement. If you review the documentation, you can identify the details of what needs to be done.
Mike’s time is split between core technical operations and grant-funded projects. One of the advantages of transferring Mike’s time to central funding is that we can add to core priorities in terms of development.
Action Item: Discussion of priorities for afternoon, keep in mind that as you go after grants, we want to carefully balance how we dedicate Developer time that we have available.
Trish Rose-Sandler from Missouri Botanical Garden
List of Sites, Tools from Grant-funded that will need ongoing support
Started at Missouri five years ago as a Data Analyst working on the Cite Bank project which has now been integrated into BHL portal. Over last couple of years, working on writing a lot of grants and project management on Purposeful Gaming and Art of Life.
WIll be involved in IMLS grant BHL Expanding Access over next couple of years.
As wrapping up recent projects, we would like to look at the tools that were developed outside of main project architecture and what kinds of support would be required to
Art of Life - improve access to natural history illustrations in BHL. Alogirthm to ideinty and then classified and described them. IMA developed four algorithms, two were dropped pretty early on due to issues with accuracy. Contrast and ABBYY algorithms continued - Contrast wasn’t accurate as we initially thought. Recommend ABYY based on 76% accuracy rate. This does require resources, specifically a dedicated server to run over new content as well as a monitoring process to be set up. Sometimes they can get stuck or error out and we also need an automated process set up to move the data back into BHL.
Macaw Image Classifier
Once images are identified, need some tools for basic classification (e.g., map or illustration, color or black and white)
Joel took Macaw tool and added functionality for image classifier and was set up in January. Enlisted volunteers from BHL community. Classified over 6,000 pages. Service currently runs on server at MOBOT and Trish manages volunteers. Requires a server and a volunteer manager which involves soliciting new volunteers and registration for them along with initial QA for new volunteers, and need to regularly export data back to BHL.
Science Gossip
sciencegossip.org is part of Zooniverse platform as a partnership between MOBOT, BHL and COMSCICOM
It was a spin off of Art of Life with a lot of success since June of this year. Most of these audiences had never heard of BHL beforehand so this has been great for visibility.
Zooniverse platform has a “Talk” function where they can ask questions and discuss things with other volunteers.
ComSciCom dedicated support for at least another year and feel the partnership has been beneficial
Nominated for Ayrton prize from British Society for History of Science.
Requires resources from BHL including biweekly monitoring of Talk, identifying new content for classification, Zooniverse now grabbing pages themselves which helps with Mike’s time, keeping volunteers informed of progress to keep them motivated and providing challenges to keep them engage. Finally, need to export back to BHL.
Extracting tags from BHL FLickr stream
Over 100,000 images there. We needed a wrokflow to capture metadata and store locally and eventually make searchable via the portal. Mike set up a workflow to search Flickr API and we’d like to do that monthly and store locally. We would need to set up as a more automated process.
Purposeful Gaming (IMLS) ends the end of this mont. Test aplicabiliuty of games for OCR correction. Involved transcription tools DigiVol and FromThePage. Put up William Brewster diaries and Horticulture catalogs. Support needed includes uploading content, performing QA, exporting data and bringing back to BHL. Decided to set up both transcription tools to get practicual experience and two tools with outputs that could be pushed to game. FTP requires server to be set up locally. Same as DigiVol, Uploading content, performing QA, exporting data and bringing back to BHL.
Output of project was the two games - Beanstalk and Smorball. Hired game designer TiltFactor based at Dartmouth. Two games being hosted on servers at Dartmouth and we’re discussing if they’d be willing hosted beyond the life of the project. Support needed includes a place host them, getting new players, eventually will need more content.
Mining Biodiversity
No cost extension requested. This is a Digging into Data challenge funded by IMLS. Tools include Google NGram that automatically corrects OCR being worked on by Canadian partners; only works on English words. Not a whole lot of success yet. Also, the crowdsourcing annotator (Argo) tool. Taking the crowdsourced data from Argo and using another tool to identify and extract entities in text. AltMetrics and MyTweeps brought into development server and Disqus which is still in process. Also some visualization tools to help filter search results.
Susan - Trish, how do you prioritize these items?
Martin - most interested in how these fit into strategic goals.
Trish - looking at these and how they relate to the Strategic Plan. They all touch at least at one point and they do come out of requests that have come in over the years. A lot of times these kinds of things do have to happen outside of standard architecture so at the end of project, you always need to evaluate about what worked and how/if they could scale.
Martin - Things also change. New algorithms being developed all the time so what is available at the end of a project might be better than what had been available to work with during the life of the project. So keeping in mind the need to be nimble in integrating or scaling these in a sustainable way.
Marty - Trish’s involvement could really provide input that could be beneficial to those making the decisions.
Grace - do we have a timeline of when we want to make decisions?
Martin - we want to work backwards from outcomes to determine the best way to achieve. If method used in project is the best way, we would use that.
Action Item: Timelines will be discussed and determined at the Members Meeting.
Connie - pushing on transcription is a good idea. Both transcription and OCR correction.
Martin - important to always remember scalability. Little over 100 field notes completely transcribed at Smithsonian (out of over 7,000 that have been cataloged) over the last two years.
Ongoing community management is key. SI Transcription has a full time person.
Action Item: Staff to provide input on priorities and timelines by the end of January to be considered at the BHL Annual Meeting.
Link to Tech Wishlist in Gemini (must have Gemini login)
In the Issue tracking system - Components >> Tech Feedback
Everything that is Status >> Closed and Resolution >> Wishlist
44 issues
See agenda for questions to guide conversation
Any requests that come into Gemini, they get assigned to Mike and/or William
How can we improve the process of reviewing these kinds of requests and prioritizing and gathering inputs on options for how to implement?
Think about size / scope of requirements to implement
Wishlist Filtered - has added comments
Category, Keywords, Difficulty
For example, all the requests related to the PDF generator
Carolyn: Have we done any analysis on how often any particular problems or requests have been submitted by a user?
Mike: I have tried to do some grouping on similar issues. About half of the requests are one-off, the other half are repeats.
Carolyn: Can we do some sort of matrix to visualize repeats, determine biggest requests from users, and use that to prioritize?
Jackie - not all the things actually get tracked because Jackie is aware of some of these and filters these out. Won’t be able to catch everything.
We could change the level, we get this request often. Who decides that though?
What is our threshold for what qualifies for “often”
What are our options? This has to do with UI. Not a way to do this piecemeal.
Program Director with advisories from TAG to prioritize and work with Mike on how they want to do this. Probably on a two month list. Also revisit the charge of the TAG which has been a reactive and would like to structure as more proactive. Trish, Susan, Carolyn, Mike, Joel on TAG to start in December. JJ and Randy interested, too To discuss with supervisors
Action Item: If interested in participating on TAG who is not yet, talk to your supervisor / Member representative to volunteer you.
Action Item: If anyone wants to lead the TAG, talk to your boss! By January, when input is due.
What are other inputs that are needed? Staff across the board have important input that could help inform.
Part of the Staff Call is to bring up these issues. Should probably also add this to the Members call.
Action Item: If there is something that we know of that we would like, enter them into Gemini!
Susan - Jackie, it would be helpful if you could open an already existing one and implement a counter.
Action Item: A counter is an appropriate request that we can see if the Gemini system can handle. Let’s add this as a ticket!
Jackie - I’d like to automate a way if something hits five duplicates, automatically up to a higher status.
Bianca - we can also submit as a request to Gemini
Library of Congress - we can’t wait around. We check libraries and contact them directly and add to scanning. If can be handled more quickly in Gemini, than that would be more useful.
Jackie - if contact directly, make sure also in Gemini then follow-up directly to help us with tracking and deduplication.
Wishlist are items that require more thought or more work that we can’t address right now. Other tech things that are adressed day-to-day.
Issue 30059
Used to have a way to go from Title bib to the Admin page when it was a subfolder rather than separate, is this a big concern or have we moved on?
Two likes
Jackie - It would make it easier but not a high priority for me. Security issues makes more complicated.
Nice to have, not need to have.
ISSUE 49857
Search - add date range
Would be fairly straight forward
How much time ?
Seems like it could have be a very helpful priority. Moderate difficulty.
Could be rolled into additional search requests? We could bundle those together
See also 48702
ISSUE 56419
Search field for illustrators
We have the data from Projects
It’s also in MARC (but not always there and not consistent)
Data does not support search.
Is there a way to concatenate?
Cost vs benefit may not balance out either
Not realistic. Should indicate if this is something that we are not going to do this.
But we’re gathering illustrator / illustration metadata through the projects
In terms of illustrations, yes. Illustrators maybe not as much.
We’ll never have massive amounts of illustrator information
Some have machine tagged illustrators but not many
Full text search could retrieve illustrator know
Bianca will let Jennifer Hall know won’t fix this
In Science Gossip - tagged every image with symbol indicating who the illustrator was
Is there a list or index? Grace is tweeting a link to that database
How to handle contradictory requests?
Make sure to see which align and which might make sense to do together and maybe some might make sense to not do if it contradicts those other related requests.
31726
Marty - if we could structure around segment level to facilitate this kind of selection
Shift click to select a range would be reasonable easy to implement
Seems like one to pursue!
Mike will investigate the related issue
Full text search
High priority from beginning
William has made some progress towards this. Martin sees this revolving around UI and how it will be displayed will be the hardest part. Make Full Text Search our priority 1. Need cadre of folks to test and define what is wanted from full text search.
Develop a focused working group for full text search.
Agreed!
Ask Members for full support to complete in calendar 2016.
Ngram and traitbank - could build in analysis of full text
Joel - move forward with full text search that we’re all used to then get feedback.
Results and then snippet of where word appears
Won’t search on stop words (multilingual stop!?)
Index not just text but also metadata
Two real use cases - that will help define what other fields would need to be added
Joel
Talking about faceted search there. Solr capable of handling this.
Metadata problem we’re encountering with discovery tools. Are there mixups or unclear associations from where inherited from the structure?
Action Item: Full Text Search to be completed in 2016. WIll be presented to Members in April
Is metadata cleanup technical? manual? something else? Would benefit a lot of things.
BHL metadata is dirty. How can it be solved?
For KBART, we have 25 data elements that we can map. Start year End year Start volume End volume and issues are tricky because its a mismatch with holdings format that people use. Link resolvers use it too because you need the whole range.
Adam - we’re looking at retrospective estimate and proposed solution. Needs aggregate look at a title to maintain.
Action Item: We’ll revisit with the discovery tools group in future discussions.
Name Authority
We need a bucket that we’re using as a name authority or module. Or are we vetting it against an external authority file on ingest?
It is a technical problem. Create a technical infrastructure to handle authorities for author names.
Grace and Bianca
Small group breakouts and discussion
GOAL + Purpose:
Goal: Begin creating a comprehensive list of activities that we are doing and services we are currently supporting as part of our work for BHL
Purpose: To collection and document these activities and services. Discussion and analysis will come later.
Question 1 - What kind of BHL activities do you or services support?
Question 2 - What BHL services do you or your patrons use?
Martin and Keri
See https://bhl.wikispaces.com/BHL+Financial+Dashboard
Spreadsheet - a way of tracking BHL participation by institutions, tracking, return on investment.
Keri will take all the elements/tasks identified from previous exercise, and place tasks onto the in-kind sections.
Staff members came up to front of the room and placed elements/tasks under:
Discussion:
-Conservation
Tomoko - need to conserve items pre scanning
Metadata: cataloging, portal editing, content enhancement, verifying biblio records, data analysis
Development/fundraising: grant writing, admin/finance (administration and development) commercial requests,
Administration: committee work, financials, monthly calls -
Discussion of monthly calls, should there be a separate communications category...or should it be a part of administration?
Marty - the smaller we can keep admin elements the better.
Carolyn: decision: committee calls: to be categorized via function area, Staff calls = admin
Gemini put with reference: for administration
Social media /outreach Both pushing BHL out and also outreach
ACTION ITEM Martin created new category ...Training and Reference - Carolyn
Communications (external) Quartely report, PR, presentations, promotion through public speaking
Metadata
Databases and systems software dev
tag, troubleshooting
Interns and volunteers
Collections support
Other
Summary:
This was a varied discussion taking the tasks which came out of the exercise, and plugging these elements into the inkind/contributions categories. A new category emerged: training and reference. (note: discussions were rapid fire/wide ranging - note taking difficult)
Carolyn
Presentation Materials: https://bhl.wikispaces.com/BHL+Reports+and+Statistics
Statistics summary: Goal - would like to look into new places for reporting accomplishments/tasks; where to find statistics: on the wiki, under BHL reports and statistics: google spreadsheet: digitization stats: per institution. You can also see these stats via admin dashboard.
Other stats: website stats
Social media website stats
Permissions
Items in production
Monthly stats - can find admin dash, wiki
Are our current stats capturing what we need?
BHL reporting spreadsheet on internal wiki
Social media stats – how social media is driving traffic back to website (ROI)
Gemini – how many open and closed, types of request
Permissions – tracking copyright licensors and titles we’ve acquired
Marty suggested – Cornell adds BHL records into their local catalog. Would be good to know which visits to BHL website came from the Cornell library catalog. Google analytics does track source. Hard to track by IP address but maybe domain referrer.
Would be helpful to track scanning in-house vs sent out. Can we get Reports from Macaw for montly uploads? Yes. Also Record goes to IA lists scanning operator. But the operators could change so better to track contributor.
Any analysis of user search terms to look at frequency of searches and whether those subjects are represented in collection. Mike says he Could pull out of web search logs
Could we look at visitors to website and those who submit feedback?
Gemini – would be good to track more. Should we look at the age of an issue to see how quickly we respond to requests? Can create custom report s, download .csv, does visualizations
Do we track user environment like types of devices they use to access our content? Yes Our site isn’t really ideal for mobile devices
Would be good to have stats on training and reference but not sure of the best way to track. We’d need to get these from all institutions. Reference internally or externally with the public? Want to track How much staff time and funding did we devote to training and reference for BHL
Does everyone track reference stats? Most do. Would this be useful stats for other institutions? Would require diligence on everyone’s part to do it accuractely. *Maybe bring to the members to get input.
Any other things we want to learn from stats?
Who reports on pagination stats? Is this using Macaw ? Part of scanning stats. Do we want it to reflect only those stats from Macaw that are manually? Keri says its part of what we do as part of scanning.
What question are we trying to answer with pagination stats? Not clear. Most folks agreed not really all that useful to track pagination stats.
Carolyn: open discussion, ask for questions:
Marty: can we track how people got to BHL from for example, Cornell catalog or website? Keri: this can be done
Mai - Knowing internal Macaw scanning via IA scanning? Bianca, can be done via Macaw.
Michael (Cornell) - Exploration of search terms used by users? Mike L. - this can be done.
Bianca - How many people use our website vs. how many people submit feedback? Grace - may be able to use Gemini to help determine this.
Jackie - There’s a whole pool of data available in Gemini which we could harvest, may take time to figure out how to harvest these informations...spreadsheets and visualizations.
Michael - can we track mobile device use? Do we have mobile site...Martin, not exactly, OK on tables and large phones
Carolyn - Could we track reference interviews? How much staff time was devoted to reference per year? How many BHL reference questions were asked per year? Bianca, as there is a precedence.
Carolyn- Do you want reference stats per anum? Is it feasible?
5 min mark - any questions?
Grace: How many look at Quarterly reports? - many responded yes.
Carolyn- How many report pagination stats? How many report macaw pagination stats? Marty - what are we looking for in this stat? Keri - Marty - Bianca: what do we want from these pagination stats?
Action Item: Stop tracking pagination...YES.
Last question - Carolyn: please let me know at any time if you have issues.
Bianca & Jackie
Purpose – discussion future of fb management, value, to program,, moving forward.
We will cover:
· Personnel changes
· Feedback Management
· Future Approaches
· Discussion/Actions
Personnel changes
Background
2009 significant resources were put towards Gemini. About 3 staff working on.
From 2009-2013 work distributed among 2 BHL FTEs. Proved unsustainable as central staff needed to focus elsewhere
When Jackie was hired she began helping out halftime to take on role fo feedback coordinator. SIL can no longer afford this person to work on BHL
BHL consortium must decide what feedback management looks like
Tasks – what kind and how much work is involved?
· Gemini Admin
· Triaging
· Responding to users
· Moderating open issues
· Distributing to staff
Gemini Admin
1. System maintenance/updates
2. Staff communication & training
Triaging
1. Categorize issues
2. Assign to correct resources
Responding to Users
1. Canned responses when possible – copyright, image reuse
2. Send custom responses as needed – take a lot longer, sometimes require expertise
3. Perform minor work/assign major work – regenerate PDF, author names merging
4. Build relationship/support BHL – testimonials,
Moderating all open issues
1. Check in on open issues
2. Communication
3. Statistics
Respond to Staff
Provide feedback
Monitoring challenging issues
Gemini stats: then & now
Created has always been greater than closed. Getting more ref questions, more complex tech problems and more complicated scanning requests
Can’t predict what months you’ll get more requests in order to plan staffing needs
Parting reminders
Gemini is not just feedback but also internal communication tool
Don’t forget about social media. Should communication via social media be different from feedback form?
How should we manage going forward without Jackie?
Take off the feedback link!
Action Item: Should we do an FAQ? Yes keeping wiki up to date is important for keeping questions down
WE could share gemini requests more among staff but Still need 1 person doing the intial triage.
Action item: Should we have a Gemini committee – regular calls,
Possible Futures
Identify some folks to manage and do the best we can and still open issues will continue to go up
We shut it down – link off website for requests, scanning, etc
Gut reactions –
The classification of questions - is it useful or specific enough? Could labels help triage better?
Users don’t know how much time it takes
Action Item: Auto email that says thanks for your feedback. The automated response could have more info to tell users where to go for certain common things.
How central to our mission is feedback management? From 1 to 5, 5 being very important
17 people gave it a 4, 8 people gave it a 5)
What level of service should we provide?
0 said 1, 2 people said 2, 3 people said 3, 19 said 4,
Bottom Line: BHL Staff Mtg attendees majority voted for level 4 of service to users
How to move forward? Rotate management of feedback – train people to triage.
Many people are in favor of a group to figure this out 10 people.
Action Item: We also need the members to also weigh in on the importance of this. Jackie, Bianca and Martin will work on a presentation
See also https://bhl.wikispaces.com/help
Macaw (Grace Costantino, JJ Ford)
Macaw training presentation: https://drive.google.com/file/d/0B00hDkSQMhfDTDlxSVJYODZ6M3c/view?usp=sharing
Macaw 1 pager: https://docs.google.com/document/d/1OpmiAc5dBgfEAzYp_P0DRu89bY7LzN1QIe3z83kRrrY/edit?usp=sharing
Macaw link: https://macaw.library.si.edu/
Admin Dash
Admin Dash link: http://admin.biodiversitylibrary.org/account/login
Lots of training materials available under BHL’s “Help” page https://bhl.wikispaces.com/help under the heading “Administrative Dashboard: Portal Editing”
Segment editing documentation highlighted, see https://docs.google.com/document/d/1KTk_enmpGDRdnmS7ALmJr42-R7HNeVlZp7w7y85P-sU/edit
Action Item: review segment creation questions from doc linked above w/ Tech Team
Action Item: review Author editing functionality w/ Tech Team and identify areas for improvement
Gemini
Gemini link: http://biodiversitylibrary.countersoft.net/dashboard
Gemini one pager: https://docs.google.com/document/d/1EjKVG0SvoP3g2q8dT6guDpRKfwI33_Du6y4H5BqUcJ8/edit?usp=sharing
Lots of training materials available under BHL’s “Help” page https://bhl.wikispaces.com/help under the heading “Gemini”
Grace
What do we mean by outreach?
all of the above -vital to our roles and the Library.
See the outreach and communication plan strategy:
http://bit.do/bhloutreachplan/summary
Collaborate on outreach:
---
---
BRAINSTORMING:
Intention to target core audiences: Scientists, taxonomists, librarians
TDWG
Action Item: Listservs - We will target listservs for particular BHL events
Geonet
Open Access Week
SPNCH
CODE4LIB
Museum exhibits
Action Item: Staff, please think of the above
Input requested on other activities we are not involved in:
Bianca
Questions that you might have about BHL – you can email Bianca.
How can we do a better job about getting our documentation available and searchable?
Documentation should be referenced, not read.
Documentation should be searchable.
Documentation should be oriented around tasks, not screens.
Should be pull-apartable. Answers should be easily digestible and you should be able to link to the specific sections that you need.
Achieved through rapid writing and incremental updates. Create as needed and then you can revise.
Most people agree that documentation should be about this (above). But Suzanne makes the point that knowledge capture can be difficult. If someone that knows something leaves, and then the question comes up later, the knowledge is gone.
William – sometimes improvements can result in things not being found if they were available via old links before and through the improvements those links disappear.
How can you find our documentation? you can search for “help” on the wiki to get here. Left-hand column of the internal wiki has a link. Help link of the admin dashboard also links to that.
Bhl.documize.com: provides cool ways to search the documentation. We could structure our documents in ways to fit into this well. But the interface is rather sterile.
Bianca found this because it’s a countersoft product. They mentioned that this tool is just released but still in progress and they want to incorporate into Gemini system.
Communigator: another way to create dashboard for documentation. You can get search results and faceted lists based on your documentation. We can plug this into our wiki and make it more searchable. But it will cost us – about $600 year minimum cost for only two users, so if we want everyone to have it, will cost considerably more.
Martin: One fear is that if we start a new system for documentation that it will turn into another problem in terms of managing it in the future.