Strategies for film and video digitization - DLF 2011 working session

http://goo.gl/Wck0Z

  1. lightning talks (45 minutes 1 - )
  2. voting on topics (15 minutes)
  3. discussion of topics (1.5 hours)

TOPICS:

1 File formats & digitization specifications, implications for storage needs (VOTES: WINNER )

  • Uncompressed video in a file wrapper: Stanford uses the .mov wrapper
  • JPEG2000 (not MotionJPEG2000: this is an oft abused term) frames in an MXF wrapper: few organizations are using this. Library of Congress, Smithsonian, a few others, many of whom use the SAMMA conversion system.
  • Apple ProRes compression format, there may be some concerns about long-term preservation.
  • Stanford’s standards for preservation: http://lib.stanford.edu/stanford-media-preservation-lab/moving-image-digitization

2 Digitization as a preservation strategy (VOTES: 19 ) (reformatting); development of tiered strategies to prioritize preservation and digitization needs

  • To digitize for preservation or not? what are the factors? the curator’s wishes are important here
  • How important is availability of playback equipment? Library of Congress at Culpepper facility are stockpiling equipment. Is obsolescence a prioritizing factor?
  • Assessment tools: IU has developed one called FASIS (?) to assess the audio. There is another tool developed by Columbia/Illinois
  • Is there (or should there be) a registry for people who are reformatting video collections so that they can indicate that they have reformatted something? Like the registry of digital masters but for video. Library of Congress (?) working on a national plan for audio. There are risks here due to the legality, copyright issues. Perhaps a process for identifying orphans?
  • Question: how long can you keep film? Under the right conditions, with fully developed film, hundreds of years. But needs to be stored properly (cold).
  • No reliable figures for longevity of magnetic media, numbers tend to fluctuate widely.
  • I am about to accept a collection that is in BetaSP: how much loss will there be in transferring from BetaSP to a digital format? There shouldn’t be any as long as you choose a high enough data rate/hiq file format.
  • Does digitization necessarily = some loss? If done properly it needn’t necessarily.

3 Handling of born-digital video materials (VOTES: 16 )

  • Are there best practices around preservation of Skype video?
  • Capturing live events (Occupy Wall Street) livestreams
  • Videoconferencing sessions: Polycom, Tandberg as H.264. Taking a compressed format and recompressing.
  • Store the native format, or as close you can get to the original, add a small set of formats to transcode to. DV, MPEG2,
  • Most digital video has some kind of compression applied to it, may be platform- or software-dependent. Transcoding to a new platform is likely to recompress it, there is loss at each step.
  • Hannah recommends the Primer for CODECs (prepared by AVPS consultant both Stanford and IU used).
  • Relying on ffmpeg to be able to transcode in the future. As long as ffmpeg can read it, don’t worry about trying to guess what access files you’d need to create ahead of time.
  • Scary paper: “Billions of bits, just thousands of ways to encode ‘em” out of the preserving public television project, NDIIP, David McCairn author. http://www.current.org/tech/quality3-1010-recording.pdf 
  • WGBH: get as much raw as you can, never know what you will want in future. Chews up disk storage at an incredible rate.
  • What about intermediary codecs? Apple intermediate codecc. Capturing a video on a phone, want to edit it but can’t go directly into FinalCut, need to transcode to something else first. Not cross-platform, only OSX support.
  • Important to be in touch with other units on campus who are producing video that the library may need to/want to preserve. Need to understand their production workflow, how are they creating video, make sure the highest resolution copy can be.
  • Best practice guidelines: who will take responsibility for creating these?

3a Born-digital but on a physical carrier (DV, miniDV)

  • Not a file on tape, it is a stream of video content that can be converted to a file, not always a continuous capture process, timecode may be disrupted on the tape.

4 Storage (VOTES:  15)

  • Digital storage
  • Motivating factors for choosing a storage strategy: spinning disk system like Isilon, rather than a strategy that includes a tape storage system, more expensive, but for some campuses possibly more accessible due to institutional environment (Temple)
  • Tiered storage: variety of techniques. Indiana’s approach: 1) working storage to support production and workflow. High density NAS solutions here. 2) Long-term storage: primarily a tape-based hierarchical storage solutions, fiber between two locations 50 miles apart. Initially designed for researchers to have (virtually) unlimited storage. Disk is cost prohibitive for very very large amounts of data/video. TCO for a Tb deposited today is $600 for a single copy.
  • Cloud storage?
  • USC has very large storage through the Shoah foundation archive. JPEG2000-encoded video. Did some experiments with moving large files to the cloud and back, and did experience some loss. $1000 for 20 years per Tb.
  • $3000 for 5 years per Tb.
  • Northwestern in a process to evaluate the storage landscape campuswide, includes ‘curatorial’ storage, proposal expected by December. Looking at both Isilon and various tiered solutions. Also including Dell DX solution in the evaluation, but there is a lot of interest in cloud storage solutions. Internet2 recently been investigating this, box.net agreement
  • Stanford has a hierarchical storage/tape storage solution. Stage preservation master files on disk while they wait for ingestion into the preservation repository storage, which is a combination of tape and disk. Most likely will eventually be tape only. Contract with central IT unit for LTO services.
  • Ultimately the solution may need to be ‘above the library’ . Too big for the library to handle.
  • Physical storage

5 Workflows (VOTES: 13 )

  • How do we organize the work of digitizing things?
  • Small, ad hoc stuff. Large projects that will require extra funding. The stuff in the middle is more challenging, being more nimble to be able to meet unexpected requests.
  • Stanford has been working on processing things as they come in, assess, try to deal with metadata and possible issues with labeling.
  • Patron requests: building the in house capacity has really helped in this regard, don’t have to scramble to find money to pay for outsourcing.
  • Challenge of assessing outsourced video when we lack a quality reference: what is quad 2 inch video supposed to look like. Need an AV artifact atlas! so that there is a vocabulary for describing the things we’re seeing in converted video that may or may not indicate a problem with transfer.
  • Outsourcing QA process: how does it compare with what one would do for image reformatting? Stanford: With these kinds of vendors we would typically do a test batch. Initially thought would start with a lot of review and then taper off, but that hasn’t really happened, look at the beginning and end of every file, something in the middle. Indiana: does 100% QC on film. Primarily because of the cost of outsourcing.
  • Harvard; process has been that vendors do digitization and curators do the QC. Is it standard that production techs do this work at other institutions?
  • Question about what derivatives are created and whether watermarking employed? Generally no to watermarking.
  • At what point is derivative created, and what metadata schemas are used? Stanford: schema will vary a lot, sometimes engineers are adding things to their production logs that are more like descriptive metadata. Use the usual suspects otherwise: MARC, MODS, sometimes EAD. Any extraction of technical metadata directly from files? Yes, are working on this as well.

Wrap up, concluding thoughts

  • What are natural next steps? What should DLF recommend? Is a standards or best practices document already out there? If not, should there be one, and what role does DLF play? AMIA is active in this area. DLF may not longer be in the business of taking on large projects like this. Does the membership want this? Should this be an ongoing IG or discussion within DLF? We HAVE to continue this discussion, it is an issue that’s coming up for all of our organizations.
  • What do we think of this format for a session? Thumbs up … ?

THE TOPICS BELOW WERE NOT DISCUSSED IN THE SESSION

Cost (VOTES: 11 ) Strategies to communicate about cost and technical issues with library administration and collection development specialists (e.g., what are the exact cost/technical implications of accepting donations of large collections of videotapes)

Staffing & expertise (VOTES: 12 ) (who is doing what, what skills do we already have that translate to working with these collections, and where do we find we have skill gaps); good places to develop staff expertise (professional organizations, conferences, etc.)

Access, copyright and other policy implications (VOTES: 9 )

Synergies with other campus priorities and projects (VOTES: 7 ), relation with other campus groups (e.g., campus-wide IT to secure very large storage capacity; online video services for teaching and learning, campus events, etc.)

In house vs. outsourced digitization (VOTES: 6 )

Metadata (VOTES: 9 )