Lauren Pressley

12 April 2007

LIS 615: Collection Management

The Google Book Project:
The Googlization of Academic Library Collections

 

            Google plays such an important role in people’s online lives that the name itself is synonymous with search. The company began as BackRub, in 1996. In 1998, renamed Google, the company began offering search on the Internet. Google began looking for other types of searchable information and in 2004, Google Print was launched. The project was later renamed Google Book (“Google Milestones,” 2007, n.p.). This move was not surprising, coming from a company whose mission is “to organize the world's information and make it universally accessible and useful” (“Company Overview,” 2007, n.p.). The URL for the Google Book Project is http://books.google.com/.

            Since the beginning of the Google Book Project, there have been fans, moderates, and opponents. This paper will explore a brief history of digitization projects, potential futures in digitization, the Google Book Project, the impact of Google on the digitization field, and the relationships between libraries and Google.

The Google Book Project is a new enough product that specific examples of its impact on collection management are yet to be seen. However, understanding this project can provide context for librarians when developing collections. The Google Books Project provides access to public domain collections, potentially impacting collection management strategies.  The project will likely impact how some library users will do research, as discussed later in this paper. Collection managers who understand these research methods will better be able to develop collections to support them. Finally, having an understanding of the state of digitization can help libraries determine what existing collections in their ownership could benefit from digital format, can help librarians choose between ebooks and print books when necessary, and can help librarians understand the total information environment that their users have access to in the research process.

A brief history of major digitization projects

            Digitization projects have taken place in a number of different settings, from not-for-profits to libraries to commercial businesses. It is useful to consider some of the major digitization projects that preceded the Google Book Project to give some context to this most recent endeavor.

Project Gutenberg

            Project Gutenberg is probably the oldest digitization project, started in 1971 (“Gutenberg: About,” 2006, n.p.). Project Gutenberg’s mission statement is “to encourage the creation and distribution of eBooks,” and the organization does this by providing the full texts of works in the public domain (“Gutenberg: About,” 2006, n.p.). The project is run by volunteers who type up out of copyright works so that they are searchable and downloadable text files. As technologies improve, the project has expanded to include audio books and digitized sheet music (“Main Page,” 2007, n.p.).

Library initiatives

            As more material is available online, people are beginning to realize that information that is not on the Internet is possibly forgotten by some researchers (Hafner, 2007, n.p.). Libraries and museums house vast quantities of information that is not yet available online. This leads many organizations to choose to digitize part of their collection. Most libraries that are choosing to digitize their collections scan in pages of books, journals, and photographs. Unlike Project Gutenberg, these images actually look like the original work, which can be very aid researchers looking to see the original form and context for the work.

The Library of Congress offers a number of digital collections. However, it is estimated that only ten percent of all Library of Congress materials will be digitized due to the high costs associated with such projects (Hafner, 2007, n.p.). Some libraries choose to make electronic indexes for their physical collection, thinking that interested patrons will come into the library if they can find out that what they need is there from an Internet search (Hafner, 2007, n.p.). Yet, other projects have proven that by putting digital versions of material on the web, there is a significant increase in use of materials (Hafner, 2007, n.p.). Electronic indexes might only be a temporary fix for a costly problem. Libraries that would prefer online digital collections will often choose to partner with corporations, like Google, to digitize their collections in order to mitigate the high cost of digitization (Hafner, 2007, n.p.).

Amazon

            Amazon offers users the chance to Search Inside!™ publisher-approved books. This gives customers the opportunity to see the table of contents, cover notes, indexes, and a few pages of text to help determine if they want to buy the book. Though customers using Search Inside!™ can search the full text of a book for specific terms, they can only see select portions of the scanned work (“Search Inside the Book: How It Works,” n.d., n.p.).

            It appears that the search feature of Search Inside!™ is based on a text file, like Project Gutenberg, as text files are searchable by computers. However, the images that users see are scanned in pages of the original work. One drawback of the Scan Inside!™ feature is that it was not designed to allow scholars to use the work no matter their geographical distance. The service is only offered as a way help customers determine if the book they are viewing is one they would like to buy. As such, sometimes the scanned version of the book is not the exact version of the book that the customer is deciding to buy. Sometimes the images might be from a hardback and the user is considering a paperback. Sometimes the images are from a previous edition of the book (“Search Inside the Book: How It Works,” n.d., n.p.). The digital images, in this case, are not exact replicas of the work that a user might be interested in using.

Publishers’ efforts

            As ebooks become more prevalent, publishers have begun their own digitization efforts. Random House and Harper Collins have added the ability for users to look at portions of selected books online; in some cases users can even listen to portions of audio books (“Insight Browse and Search,” n.d., n.p., “Browse Inside,” n.d., n.p.). This, however, requires users to know who publishes which books, and the content is not brought together in a central source for easy searching.

Defining the Google Book Project

            The Google Book Project is a much broader endeavor than the major digitization efforts that preceded it. The Google Book Project is an effort to scan in books from major research libraries, make them searchable, and make the text viewable on a sliding scale depending on copyright restrictions. Public domain books are fully viewable, while books that are still protected under copyright have limited viewing options. Google describes the project as a way for users to “search the full text of books to find ones that interest you and learn where to buy or borrow them” (“About Google Book Search,” 2007, n.p.).

            Google Book Project users are presented with two search options. One is the famously simple search bar that looks like the regular Google page. The other option is an advanced search that looks more like an Online Public Access Catalog.  This search interface can be found here: http://books.google.com/advanced_book_search. Once a user has found a record of interest, they can view an “about this book” page that contains typical bibliographic information. However, many books have more content on their “about this book” page, including key phrases, references from other scholarly publications, the table of contents, related titles, and bookstores and libraries where the user can find the book (“About Google Book Search,” 2007, n.p.). For an example of these features, you can access the page for Web-Based Instruction: A Guide for Libraries by UNC-G Library and Information Studies alumna Susan Sharpless Smith at http://tinyurl.com/2oguzq.

            As stated earlier, there are varying views of texts depending on the copyright status of the work. The most useful view is the “full view.” Books offering full view are out of copyright, or books where the copyright holder has granted permission to display the text. In these cases, users can page through the entire book, download, and print with no restrictions (“About Google Book Search,” 2007, n.p.; Sands, 2006a, n.p.). “Limited preview” books are works that limit the amount of text that a user can view online. In these cases the copyright holder has given permission for the book to be searchable and viewable, but they want to restrict the way users may access the work (“About Google Book Search,” 2007, n.p. ; Sands, 2006a, n.p.). “Snippet view” allows users to search for terms and see up to three selections from the text. This way viewers can see the term as it is used in the book, but can not read the work online (“About Google Book Search,” 2007, n.p. ; Sands, 2006a, n.p.). Finally, “no preview available” books are works that are under copyright where the copyright holder has not granted any display privileges for the text itself (“About Google Book Search,” 2007, n.p. ; Sands, 2006a, n.p.).

The Google Library Project

            To be able to provide these texts, Google had to find a provider of books. The providers they chose were some of the largest and most influential research libraries in the world. In 2004 Google signed contracts with Harvard, Stanford, the University of Michigan, the University of Oxford, and The New York Public Library to scan books from their collection, make them available online, and make them searchable through the Google interface (Tyler, 2004, n.p.). From the very beginning, the two common threads of this project were making cultural treasures available worldwide over the Internet, and providing a venue for book publishers to advertise their books (Tyler, 2004, n.p.).

            The library community had varying opinions on the project. John Berry of Library Journal said, “Like so much of what happened in 2005, the emergence and growth of a monster called Google has forced us to reposition our profession and our libraries. This can provide us with the next mission of the library, beyond but not without books, to the community builder and binder it must become” (2006, n.p.). Walt Crawford discussed the impact of Google Book on the marketplace, commenting on how books that do not draw enough interest for mainstream marketing will now be accessible through the Internet and they will be able to be found by interested audience members. These books are sometimes public domain, but not easily read at great lengths on the screen, and sometimes these books are under copyright, only allowing segments of the text to be viewed. Users who find books of interest through the Google Book Project will have incentive to purchase or check out the books they find most interesting (Craford, 2005, n.p.).

Library partners

            Since the beginnings of the program, more libraries have joined the project. Collections are being scanned at the University of Michigan, Harvard, Stanford, The New York Public Library, Oxford University, the Universidad Complutense de Madrid, the University of Virginia, the University of Wisconsin-Madison, and the University of California (“What Libraries Are You Working With?” 2007, n.p.). Bavarian State Library, Princeton, the University of California system, the National Library of Catalonia, the University of Texas at Austin, and the University of Virginia have joined the project as well (“Library Partners,” 2007, n.p.).

            David Ferriero, the Andrew W. Mellon director and Chief Executive of the Research Libraries of the New York Public Library explains their decision to join the program by identifying the project goals with library values:

The New York Public Library Research Libraries were struck by the convergence of Google's mission with their own. We see the digitization project as a transformational moment in the access to information and wanted not only to learn from it but also to influence it. Our response at present is a conservative one, with a limited number of volumes in excellent condition, in selected languages and in the public domain. With appropriate evaluation of this limited participation, we look forward to a more expansive collaboration in the future (“Library Partners,” 2007, n.p.).

           

            Sidney Verba, Director of the Harvard University Library explains Harvard’s decision to participate by mentioning the current time in history and newly available technologies:

The new century presents important new opportunities for libraries, including Harvard's, and for those individuals who use them. The collaboration between major research libraries and Google will create an important public good of benefit to students, teachers, scholars, and readers everywhere. The project harnesses the power of the Internet to allow users to identify books of interest with a precision and at a speed previously unimaginable. The user will then be guided to find books in local libraries or to purchase them from publishers and book vendors. And, for books in the public domain, there will be even broader access (“Library Partners,” 2007, n.p.).

 

Daniel Greenstein, Associate Vice Provost for Scholarly Information and University Librarian for the University of California discusses the necessity for preservation, and the Google Book Project as a way of meeting that need:

With digital copies of our library holdings, we will also provide a safeguard for the countless thousands of authors, publishers, and readers who would be devastated by catastrophic loss occasioned, for example, by natural disaster. Anyone who doubts the impact that such disaster can have on our cultural memory need look no further than the devastation wrought by Hurricane Katrina on our sister libraries in the Gulf States (“Library Partners,” 2007, n.p.)

 

            Each participating library had to weigh the value of the project against potential copyright, privacy, commercial, and legal liabilities. However, in addition to the principled benefits explained by advocates above, there are practical benefits as well.

            Participating libraries will not have to devote staff training and time to scanning materials. This benefit cannot be overstated.  Training, time, and scanning equipment are costly, and large scale projects are prohibitively so.  These libraries are able to use their staff in current positions while Google digitizes their collection.  Google has developed a system that does not damage the book and provides a quick turn around time from when the book leaves the shelves until it is returned (“Does the Scanning Process Damage the Library Books?” 2007, n.p.).

            Each participating library will also get a copy of the files created from their books (“Do the Libraries Get a Copy Back?” 2007, n.p.). This can be a real benefit for libraries by providing a recovery system in case of disaster. The digital files can also easily be used for electronic reserve or other electronic search. Some libraries plan to link the files to their catalog records.

            All libraries, whether participants or not, benefit from the search interface. When users search using Google Books, they are searching both traditional metadata as well as the full text of the book (“What’s the “Find This Book in a Library” Link?” 2007, n.p.). Then, once the user has found a book of interest, they can use the “find in a library” feature on the “about this book” page to link to OCLC’s WorldCat to determine which libraries have the book of interest (“What’s the “Find This Book in a Library” Link?” 2007, n.p.). Google’s book search uses technology to determine a user’s location so that once in WorldCat, the user sees local libraries first (“How Do You Know Which Library Catalogs to Point Me to?” 2007, n.p.).

The future of digitization

            Google organized a conference called Unbound: Advancing Book Publishing in a Digital World, where one of the participants summarized, “The future of the book is secure. It's what we do with it, how we promote it, how we develop it, and how we put new layers of meaning around it in a digital context which becomes extremely important” (Reinstein, 2007, n.p.). It is not clear, however, what the future of digitization is. There are a few trends worth watching.

            In the press-gathering New York Times article “Scan this Book” Kevin Kelly drew analogies to the Library of Alexandria, where we might finally, through digitization, have a way to collect all knowledge (2006, n.p.). He drew parallels to other industries: camera companies are migrating to digital only products, much existing music has been digitized by companies or individuals, movies are moving towards DVD format, and he pointed out that books are slow on this transition (2006, n.p.). Then, in a controversial twist, he discussed the possibility of remixing texts the way people remix music and other digital media (2006, n.p.). Words within texts could link to each other, people could create snippets lists of favorite selections, writing could change to focus on smaller sections with the intent to remix, and texts could become interactive (Kelly, 2006, n.p.).

            Digital texts make downloading possible, the same way that pictures, music, and movies are readily downloaded today. Google, as well as earlier digitization projects, provides the ability to download public domain books to any computer (Sands, 2006b, n.p.). This allows people to read the text when not connected to the Internet, when they have time at a later date, and to store a copy if the work is something they might want to come back to later.

            Digital media is easily enriched as well. Locations in a book can be tied to a location on a map (Petrou, 2006, n.p.) in case readers want to have a better understanding of the geography, or a map of places to visit. Digital media can be annotated for friends or colleagues. Kelly’s ideas of linking texts and remixing books becomes a reality.

The impact of Google

Google’s project, specifically, is playing a major role in shaping the future of digitization. As users develop their research practices around the Google interface and features, their expectations for digital collections change and adapt to be more like the interface and features they expect.

An author described her experience using the Google Book Project to to do biographical research, finding references in works about her subject. The ability to view selections of the book informed her decision to purchase or check out a book (Poole, 2006, n.p.). A PhD student described her delight at finding resources through the Google Book Project that she was not able to find in the physical libraries of Yale, Harvard, the California libraries, the British Library, Britain’s National Archives, and interlibrary loan (Gauldi, 2007, n.p.).

Google has also begun to put together pathfinders on topics of interest. For example, there is a page on Shakespeare with links to primary texts and sources. To see the Shakespeare pathfinder, you can look here: http://books.google.com/googlebooks/shakespeare/.

Google highlights teacher and student opinions about their product, describing scenarios of students in small college libraries looking at rare out-of-print books available only at larger libraries, giving younger students in underprivileged environments access to materials typically only held at wealthy institutions, foreign students getting access to materials only available in the States, and using the service to determine what books are most useful to check out from the library (“Thoughts From Students and Teachers,” 2007, n.p.).

Questions to consider

Though there are many benefits for libraries from the Google Book Project, there are also some areas of concern. Many authors and publishers have raised complaints against Google’s policy to scan everything and make parts of the text visible. In these cases, they are charging Google with copyright infringement (Kelly, 2006, n.p.; Brandt, 2005, n.p.; Claburn, 2007, n.p.). Some suggest that Google will make huge profits off the work that authors and publishers have done (Claburn, 2007, n.p.). Google says that they will remove books from their search if an author or publisher opts out (“What If I Find One of My Books in Google Book Search and Would Like it Removed?” 2007, n.p.), but some authors and publishers say this places undue burden on them (Kelly, 2006, n.p.). Though Google did pause scanning temporarily at the height of the controversy (Brandt, 2005, n.p.), Google’s argument is that this practice is what they have been doing with websites from the beginning of their company, that books are no different (Kelly, 2006, n.p.). Google also considers their scanning practice to be protected under Fair Use laws (Bangeman, 2006, n.p.).

Collections

            This paper originated as a project to determine how the Google Book Project might impact academic library collections, particularly smaller academic libraries with tighter budgets. Throughout the research process I found that the three year old project is still too young for people to have even written articles discussing the possible impact of the project.

            Instead, I found how Google Books provides another way to access library materials, how Google Books has changed how some people do research, and how quickly they are developing their project. None of this directly impacts collections, but all these points do indirectly.

            Users now have a way to access a much larger catalog of books through the Google Book search interface. This interface shows users books available at their library, as well as book that might not exist at any library near by. Users now have a way to search through books available through interlibrary loan, and that might increase the use of these services. If users prefer the Google Book interface, libraries might be faced with a need to improve their own catalog interface to keep up with this competition.

            The Google Book Project has impacted how people research. People may as less questions at the reference desk if they find this interface more useful, returning more results. People will grow accustomed to searching full text of books, as many have grown accustomed to searching the full text of articles. Collection managers might have to look into purchasing products that will allow all of their collection to be searched by full text.

            There were also the antidotal stories, where smaller libraries might not have to buy as many rare books, as their students could access them through Google Books. Small public libraries and school libraries can now have access to university level collections, so they can focus their funding on community-appropriate resources. Users find books that they never would have found otherwise, and start using interlibrary loan more.

            It is conceivable that a small library with a shrinking budget might choose to forgo purchasing some older books in light of the availability of the public domain works in Google. It is also possible that some small libraries that are strapped for space might opt for more interlibrary loan materials rather than purchase more physical items for their shelves. It is possible that one day people might actually prefer to read electronic books rather than the paper books we are all accustomed to, radically changing collection management in libraries.

            However, it is too early to know any of this for sure. What is clear, though, is that the Google Books Project is making a serious impact in the world of digital materials, and those patrons who use the product like it. It will be important for all librarians, but especially collection managers, to monitor the progress of this project and analyze its potential impact for their work and collection.


Works Cited

About Google Book search. Google. (2007). Retrieved March 1, 2007, from http://books.google.com/intl/en/googlebooks/about.html

Bangeman, E. (2006). French publishers join fight against Google Book Search. Retrieved April 11, 2007, from Ars Technica: http://arstechnica.com/news.ars/post/20061031-8114.html

Berry, J. N. (2006). Good riddance to 2005! Retrieved March 15, 2007, from Library Journal: http://libraryjournal.com/article/CA6298436.html

Brandt, D. (2005). Google spins to avoid copyright challenges. Retrieved April 10, 2007, from Google Watch: http://www.google-watch.org/modify.html

Browse Inside. HarperCollins. (n.d.) Retrieved March 28, 2007, from http://www.harpercollins.com/features/browseinsidefaq/

Claburn, T. (2007). Microsoft attorney accuses Google of copyright violations. Retrieved April 10, 2007, from Information Week: http://www.informationweek.com/internet/showArticle.jhtml?articleID=197800578

Company overview. Google. (2007). Retrieved April 1, 2007, from http://www.google.com/corporate/

Crawford, W. (2005). OCA and GLP 1: Ebooks, etext, libraries and the commons. Retrieved April 10, 2007, from Cites and Insights: Crawford and large: http://cites.boisestate.edu/v5i14a.htm

Does the scanning process damage the library books? Google Book Search help center. (2007). Retrieved April 2, 2007, from http://books.google.com/support/bin/answer.py?answer=43744&topic=9082

Do the libraries get a copy back? Google Book Search help center. (2007). Retrieved April 2, 2007, from http://books.google.com/support/bin/answer.py?answer=43751&topic=9082

Google Books Library Project—An enhanced card catalog of the world’s books. Google. (2007). Retrieved April 2, 2007, from http://books.google.com/googlebooks/library.html

Google milestones. Google. (2007). Retrieved April 11, 2007, from http://www.google.com/corporate/history.html

Guldi, J. (2007). How Google Books is changing academic history. Retrieved April 10, 2007, from Unimaginable inscape: http://landscape.blogspot.com/2007/03/how-google-books-is-changing-academic.html

Gutenberg: About. Project Gutenberg. (2006). Retrieved April 10, 2007, from http://www.gutenberg.org/wiki/Gutenberg:About

Hafner. K. (2007). History, digitized (and abridged). Retrieved April 2, 2007, from The New York Times:  http://www.nytimes.com/2007/03/10/business/yourmoney/11archive.html?ex=1331179200&en=1b38c43bcbe04b6b&ei=5088&partner=rssnyt&emc=rss

How do you know which library catalogs to point me to? Google Book Search help center. (2007). Retrieved April 2, 2007, from http://books.google.com/support/bin/answer.py?answer=46722&topic=9082

How is Library Catalog search different from the “find this book in a library” links? Google Book Search help center. (2007). Retrieved April 2, 2007, from http://books.google.com/support/bin/answer.py?answer=46723&topic=9082

Insight Browse and Search. Random House, Inc. (n.d.) Retrieved March 28, 2007, from http://www.randomhouse.com/catalog/insight/index.html

Kelly, K. (2006). Scan this book! Retrieved February 1, 2007, from The New York Times Magazine: http://www.nytimes.com/2006/05/14/magazine/14publishing.html?pagewanted=all&ei=5090&en=c07443d368771bb8&ex=1305259200

Library partners. Google. (2007). Retrieved April 10, 2007, from http://books.google.com/googlebooks/partners.html

Main Page. Project Gutenberg. (2007). Retrieved April 2, 2007, from http://www.gutenberg.org/wiki/Main_Page

Oates, J. (2006). French publisher sues Google. Retrieved April 11, 2007, from The Register: http://www.theregister.co.uk/2006/06/07/france_sues_google/

Petrou, D. (2007). Books: Mapped. Retrieved, April 2, 2007, from Inside Google Book Search: http://booksearch.blogspot.com/2007/01/books-mapped.html

Poole, B. (2006). A writer discovers Google Book Search. Retrieved April 2, 2007, from Inside Google Book Search: http://booksearch.blogspot.com/2006/11/writer-discovers-google-book-search.html

Reinstein, A. (2007). Some thoughts on books in our digital world. Retrieved March 10, 2007, from Inside Google Book Search: http://booksearch.blogspot.com/2007/02/some-thoughts-on-books-in-our-digital.html

Sands, R. (2006). From the mail bag: Four book views. Retrieved April 2, 2007, from Inside Google Book Search: http://booksearch.blogspot.com/2006/07/from-mail-bag-four-book-views.html

Sands, R. (2006). From the mail bag: Public domain books and downloads. Retrieved April 2, 2007, from Inside Google Book Search: http://booksearch.blogspot.com/2006/11/from-mail-bag-public-domain-books-and.html

Search Inside the Book: How it works. Amazon. (n.d.) Retrieved April 2, 2007, from http://www.amazon.com/Search-Inside-Book-Books/b?ie=UTF8&node=10197021

The complete plays of Shakespeare. Now at your fingertips. Google. (2006). Retrieved April 7, 2007, from http://books.google.com/googlebooks/shakespeare/?utm_source=gbsblog&utm_campaign=shakespeare&utm_medium=et

Thoughts from students and teachers. Google Book Search: News and views. (2007). Retrieved April 7, 2007, from http://books.google.com/googlebooks/newsviews/students_teachers.html

Tyler, N. (2004). Google checks out library books. Retrieved April 10, 2007, from Google: http://www.google.com/press/pressrel/print_library.html

What if I find one of my books in Google Book Search and would like it removed? Google Book Search help center. (2007). Retrieved April 2, 2007, from http://books.google.com/support/bin/answer.py?answer=43756&topic=9011

What is Library Catalog search? Google Book Search help center. (2007). Retrieved April 2, 2007, from http://books.google.com/support/bin/answer.py?answer=46719&topic=9082

What is Search Inside!? Amazon. (n.d.) Retrieved April 2, 2007, from http://www.amazon.com/gp/help/customer/display.html?nodeId=10197041

What libraries are you working with? Google Book Search help center. (2007) Retrieved April 2, 2007, from http://books.google.com/support/bin/answer.py?answer=43740&topic=9082

What’s the “find this book in a library” link? Google Book Search help center. (2007). Retrieved April 2, 2007, from http://books.google.com/support/bin/answer.py?answer=43727

Which libraries are you working with to provide library catalog search? Google Book Search help center. (2007). Retrieved April 2, 2007, from http://books.google.com/support/bin/answer.py?answer=46721&topic=9082

 

I HAVE ABIDED BY THE UNCG Academic Integrity Policy ON THIS ASSIGNMENT.