How much is much?
Developing and interpreting national library visitor statistics
Paper for IFLA 2008 (pre-print)
Tord Høivik
Associate professor in library and information studies
Oslo University College
Valid and relevant statistics are required for library planning and advocacy. As libraries and their users turn to the web, library statistics must follow. In this paper we explore the use of traffic indicators to measure the impact of web resources in national libraries. We present and discuss the use of data on on page views, virtual visits and unique users, with examples from eight national libraries - in the Nordic countries, the Netherlands, United Kingdom, Germany and France. These indicators are in an early stage of development and need some conceptual and much empirical work to become good tools for strategic planning. But we note four findings: the ratios between the three indicators are very unstable - so we must measure and interpret all three; we find substantial differences between countries - with Denmark in the leading position; the number of virtual visits is likely to overshadow the number of physical visits in the near future; analysis of web traffic must be based on an understanding of J-shaped distributions ("power laws") ratther than concepts drawn from the world of well-behaved bell curves ("normal distributions").
.
PAPER
In January, the British Library presented an important new report on user behaviour in the virtual environment (British Library, 2008). The Director, Dame Lynne Brindley, described her institution as follows: - We are a trusted and independent source, both in cyberspace and through our vast printed collections, with more than 67 million hits on our website in the past 12 months and 500,000 readers passing through our doors every year.
The emphasis on visits - both physical and virtual - rather than on collections or loans, is highly interesting. The BL numbers are large, since we speak of millions, but it is impossible to interpret them without a proper scale of measurement. My ordinary working day consists of 29 million microseconds. A snap of my fingers lasts a hundred million nanoseconds. In other words: what do half a million physical visits and 67 million page views tell us about the library? What do the numbers mean? And how do these indicators compare with other measures of virtual and physical traffic?
Traffic on the web
In this paper we present, discuss and compare a number of different indicators that are - or could be - used to measure library traffic on the web. We hope national libraries, with their large economic and intellectual resources, will take a leading role in implementing this type of usage statistics. This will allow us to understand, predict and plan the interaction between users and libraries on a more solid basis. I take for granted that all work in this area must begin with IFLA Publication 127 on performance measurement: the second edition (2007) of Roswitha Poll and Peter te Boekhorsts book Measuring quality. But the web is such a complex environment that even this excellent work will need revision. Rather than discussing concepts, I believe it is time to start to measure, to interpret and - not least - to argue about the meaning of empirical data from the field.
Web statistics in libraries are still at an early stage of development. Market actors are far ahead in their use of web analytics (Wikipedia) to monitor their customers and their results. But it is time to make a start. In the paper visitor statistics are related to other aspects of national library activity. The NL statistics are compared with visitor data from large academic and public libraries and from the most frequently visited web sites in the country. Some comparable data from Nordic libraries are also included. National libraries operate in the same environment, and compete for the same customers, as other institutions devoted to education, research, media and culture.
Norway is just one case among many, of course. The Norwegian National Library cooperates closely with national libraries throughout Europe. We participate in the Europeana Project, which will open its virtual doors - to one million digital objects - in November 2008. Concepts and methods that work in the North will work equally well in the East, the West and the lovely Mediterranean South.
In the future, libraries must deliver their services through three different channels: through a revitalized physical library, through branded sites on the web and - most radically - through popular web services and personalized web sites beyond the libraries proper. The driving forces of change - we may name them digitalization, globalization and professionalization - define the new rules of the game for all knowledge institutions.
Most people, I am afraid, regard statistics as a trivial and boring subject - best left to dismal scientists like economists, demographers and statisticians. This may have been true in the past, but it will certainly not be true in the future. Digitalization provides us with vast amounts of informative data - and the tools to handle them rapidly and efficiently. Globalization increases competition - and forces us to compare our own countries with the rest of the world. Professionals must investigate cases and contexts. They cannot just think their way to the truth. The demand for good statistical data and professional statistical interpretation is increasing in all countries.
National libraries on the web
National libraries used to be the playground of scholars. The general public was served by public libraries, while students and most researchers depend edon university, college and special libraries for the services they needed. With the coming of the web, national libraries will be in a very different position. The National Library of Norway plans to digitalize the totality of its holdings within the next ten to twenty years. The numbers are impressive:
· 450 000 books
· 2 million periodicals
· 4,7 million newspapers (more than 60 million pages)
· 1,3 million pictures (photos and postcards)
· 60 thousand posters
· 200 thousand maps
· 4 million manuscripts
· 200 thousand music scores
· 1,9 million printed
· one million hours of radio
· 80 thousand hours of music
· 250 thousand hours of movies and TV
SOURCE: Nasjonalbiblioteket (2007).
With regard to the general public, national libraries are likely to play a much more active role on the web than in the physical world. Digital collections can be accessed by everybody. On the web, in fact, nobody needs to know that you are a national library. Norway's national librarian, Vigdis Moe Skarstein, points out that
- we can offer knowledge, but we cannot force the user to come to us. Knowledge is increasingly something people seek and find, rather than something given to them. ... We must respond to the challenge by offering access to our quality services within the user's environment. [Nasjonalbiblioteket, 2006] [Original text: Vi kan tilby kunnskap, men vi kan ikke pålegge brukeren å komme til oss. Kunnskap er mer og mer noe som søkes, ikke bare noe som gis. ... Vi må svare på utfordringen med å sikre tilgang til våre kvalitative tjenester der brukerne er.]
In Norway, large scale digitization has begun. But copyright issues crop up - and the law will block access to all recent documents unless these are resolved. Norway has a tradition for collective solutions in the copyright field, and the National Library is currently working hard to find ways of compensating authors for republishing their work on the web. Once the legal hurdle has been cleared, the real work begins. To what extent will the general public - as opposed to historians, genealogists and humanistic researchers - be interested in using these resources? Will access to this vast cultural heritage influence cultural consumption patterns? Only statistics can answer that question.
The British Library
Statisticians do more than collecting numbers. We compare them. The ratio between two numbers is a basic tool in applied statistics. But the numbers must be well chosen. The number of dogs per capita is highly meaningful. The numbers of dogs per cat is suspicious. The numbers of dogs per tree is irrelevant. Dogs avoid the deep forest. The only trees they seek are the trees nearby.
We apply this thinking to the British Library. BL is located in London, near St. Pancras station (Wikipedia). The library serves the whole of the United Kingdom, with a population of more than sixty million. The metropolitan area of London has between 12 and 14 million people, while the city itself has about 7.5 million inhabitants. Half a million visits a year corresponds to:
These numbers are not high compared with public libraries, which typically attract several hundred visits per hundred inhabitants per year. But such comparisons are nearly meaningless. The number of physical visitors to national libraries must be related to the purpose and intended audience of the library.
- The Library is open to everyone who has a genuine need to use its collections. However, it is most suited to those wishing to use specialised material that is not always available in public or academic libraries. ...
The Library has come under criticism for admitting undergraduate students (who have access to their own university libraries) to the reading rooms, but the Library says that they have always admitted undergraduates as long as they have a legitimate personal, work-related or academic research purpose (Wikipedia).
Half a million visits a year corresponds to
The primary BL audience must be specialized researchers and scholars in the humanities and social sciences who live in London, or are visiting the area for the purpose of research and study. But we should note that the sheer number of library visits is not of great interest as such. It is the content and duration of the visit – we could almost say the meaning of the visit - that is important. Visits are easy to measure by the use of turnstiles or electronic counters. But the value and impact of the library depends on what people actually do once they have passed the gates.
The number of visits published by the British Library (470 thousand) refers to the use of the reading rooms only (BL, 2007, p. 27). The library does not, in other words, follow the IFLA guidelines, where a physical visit is defined as "the act of a person's entering the library premises" (Poll, 2007, p. 112). Had they behaved properly and counted everybody, visitor numbers would have been substantially higher.
National libraries often arrange exhibitions, lectures, guided tours and other open events. Most national libraries would probably cast their net more widely and include all persons that enter the building, whether as scholars or as tourists. In London,
A number of important works are on display ... in a gallery ... which is open to the public seven days a week at no charge. Some of the treasures visitors can see ... include the Magna Carta, Captain Cook's journal, Charlotte Brontë's Jane Eyre, Geoffrey Chaucer's 'Canterbury Tales', 'Beowulf', Virginia Woolf's 'Mrs Dalloway', Lewis Carroll's 'Alice's Adventures Under Ground', ...
The actual design and lay-out of the building can also influence the counting. The coffee shop and the local bookstore may, for instance, be located inside or outside the premises.
For planning purposes it is important to separate the primary users (scholars, researchers and students) from the secondary users (tourists, occasional visitors). I believe we ought to do more practical and conceptual work in this area. By a variety of simple observation methods we can, in fact, distinguish between types of use - and gather detailed data on duration and content (Høivik, 2008). But such methods fall beyond the scope of this paper, which focuses on web traffic.
Web analytics
Three indicators are widely used to measure traffic on the web: page views, virtual visits and unique users. These are the three variables that TNS Metrix, the main provider of web analytics in Norway, publishes on their open web site.
Two units of measure were introduced in the mid 1990s to gauge more accurately the amount of human activity on web servers. These were page views and visits (or sessions). A page view was defined as a request made to the web server for a page, as opposed to a graphic, while a visit was defined as a sequence of requests from a uniquely identified client that expired after a certain amount of inactivity, usually 30 minutes. [Web analytics. Wikipedia]
The number of unique visitors is only relevant in relation to a specified time period. The company TNS Metrix measures UV on a weekly basis. The British Library uses a full year.
The first two indicators are additive. The third behaves differently. The number of page views, or the number of virtual visits, in weeks A and B can be added together. But the number of unique visitors in week A can not be added to the number of unique visitors in week B in order to find the number of unique visitors in the period A+B. Some people were probably visiting both weeks - and since we want to know the number of different vistors, they should not be counted twice.
Commercial actors have moved beyond these three - and apply much more sophisticated methods to study how customers navigate their web sites. Such data are generally secret - but we can get an idea of the detail involved from Ibsen.net, which publishes detailed statistics on their web traffic. The same is true in the physical world. Commercial visitor studies are an important field within marketing research. The measures that were used in the past - such as the number of physical visits - are being replaced by measures that reveal the activities undertaken by the visitors as they move around in physical (or virtual) space. The study of user behaviour is more advanced in the commercial than in the public sphere. But the relevant methods are basically the same.
Page views
Libraries tend to see themselves as public service providers. They live in protected environments, beyond the cut-throat competition of commercial markets. The web, however, is inherently competitive. Free or fee does not matter. All web sites compete for the scarcest commodity in the world, which is not gold, but genuine human attention. Survey data from Nielsen Ratings illustrate the situation. In March 2008 the typical internet user spent 33 hours on the web and visited 70 different domains. He or she downloaded 1.550 web pages - and looked at each for an average of 47 seconds (Nielsen, 2008). Each surfing session lasted about an hour on the average - 57 minutes and 50 seconds to be exact.
We do not know whether BL visitors on the web spent more or less time than this per page. Here I assume, for the sake of argument, that each page hit corresponds to one minute of viewing time. If that is roughly the case, 67 million pages translate into 67 million minutes - or about:
We may compare this with the time spent by physical visitors - and by employees. Every day
The British Library web site contains about ten thousand individual pages. The average page is therefore downloaded 6.700 times a year, or about twenty times a day.
On the web, the location of the library is no longer important. The virtual arm of the British Library serves not only the United Kingdom, but all speakers of English - as long as they have access to the web. The total number of speakers of English - rough estimates will do - lies around 2.000 million. Of these, approximately twenty percent, or 400 million currently have access to the web. These are all potential users of the BL web site. But people living outside UK will, in general, have national libraries of their own. The primary audience of the British Library web services must be the sixty million inhabitants of the UK.
Virtual visits
So far we have considered page views. But the number of virtual visits, relative to the population served, is a more intuitive indicator. We start by defining the concept. According to ISO 2789
Poll adds that a virtual visit must come from
The British Library does not report the number of visits. Since every site visit tends to generate a number of downloads, the number of visits must be substantially less than the number of page views. The Royal Danish Library reports both visits and downloads, however:
2005
2006
Ideally, the number of visits should exclude visits from robot or spider crawls and from page reolads. Nor should requests coming from inside the library premises be included. If we calculate visits per capita, visits from outside the population should also be excluded. The points are valid, but are more representative of future goals than of current practices, I am afraid. "Cleaning the data" would require a fair amount of technical work. At the moment most library organizations are content with getting some web traffic data published.
Visits and page views
Every visit results in one or more page views. The number of page views per visit is, unfortunately, highly variable. We can not apply the Danish NL average – 7.5 page views per visit - as a conversion factor. To illustrate the degree of uncertainty, we may look at the central public libraries in Denmark - ordered by population size:
Observed values range from less than two page views per visit to a surprising fifteen pages - in Hamlet's home town. The median value was 4,7 and the aggregate ratio 5,6.
Norway
In Denmark, the larger public libraries generally deliver more page views per visit that the smaller ones. For the larger Danish libraries as a group, a ratio of five pages per visit is normal. The biggest public library in Norway, Deichmanske bibliotek, is one of the few public libraries for which Norwegian web statistics are available. Deichmanske reports rather lower ratios:
A reasonable – and testable - hypothesis would be: web sites with rich resources and user friendly design (information architecture) will be used more intensively - delivering more page views per visit - than sites with less content and less attractive structure.
Physical and virtual visits
From 2007 the German library index BIX adds the number of physical and the number of virtual visits per capita to generate the indicator "visits per capita" (Poll, 2007, p. 117). I am rather sceptical as to the value of this measure. Studies of user behavior in libraries indicate that students typically spend at least an hour - and often more - each time they visit the physical facilities. From studies of user behavior on the net we know that users typically spend less than a minute on each downloaded page. Unless library web visits are very different, the typical virtual visit would last a few minutes only. In terms of time spent, activities undertaken and knowledge gained, a single physical visit might easily be the equivalent of ten virtual visits. Instead of adding physical and virtual visits we might, more reasonably, add physical and virtual time spent. These are more comparable quantities.
Denmark
Looking at the data, we find that the relationship between the number of physical and the number of virtual visits is highly variable. The Danish National Library reported these numbers from 2002 till 2006:
· 2002: 963 thousand physical and 2849 virtual visits
· 2003: 961 and 3213
· 2004: 902 and 3375
· 2005: 877 and 3890
· 2006: 828 and 4152
As the national libraries increase their presence on the web, their physical visits are likely to be swamped by virtual visitors. From 2002 to 2006 the sum of physical and virtual visits increased by thirty-one percent. But the Danish national librarian is still worried by the decline in physical visits. During these four years virtual visits increased by forty-six, while physical visits decreased by fourteen percent.
Sweden
In Sweden, the NL differs substantially from the big university libraries. Kungliga Biblioteket lies in the same range as the four largest universities with respect to the number of virtual visits and the number of reference questions answered. All five received between five and eight thousand virtual visits a day in 2006 - and answered about one hundred reference questions a day. But the patterns of activity are very different when we consider physical visits and loans. The four university libraries receive roughly one virtual visit for every physical visit (the ratios ranged from 0.6 to 1.5). The National Library received ten virtual visits for every physical visit.
If we consider physical visits, the University Library of Umeå had five times as many visitors as the National Library. If we look at "all visits", the National Library had more visitors than the University Library. It is hard to see how the sum of physical and virtual visits can be used for strategic planning and advocacy. Physical and virtual visits are too different in nature to be counted together.
Unique visitors
In addition to page views, BL also published data on unique visitors. The number of unique hosts served by the BL lies close to five million:
The number of unique hosts is the best approximation available to the number of individual users of the website. The number of individual users (during a year) is therefore well below ten percent of the UK population.
We do not know the distribution of traffic between the United Kingdom, on the one hand, and the English speaking world beyond the Isles, on the other. In Germany, approximately forty percent of the NL web traffic came from outside the country in 2006 (top level domain = .de). If we assume an equal division between national and international traffic, we get a UK participation rate of 2.45 / 60, or about four percent of the population. But I am sure the BL has access to detailed traffic data that can give us the true numbers - and much more information about the web users - than published statistics provide.
Since we know the number of visitors, we can also calculate the average number of pages consulted:
With a viewing time of about one minute per page, we can - roughly - imagine five million users spending 12 minutes each on the website during the year. This mental image should not be trusted too far, however. When we look at web traffic, we are dealing with data that do not follow well-behaved bell curves ("normal distributions"). Individual usage of library sites is characterized by J-shaped (or long-tail) rather than by bell-shaped distributions.
In the BL case, fifty-five million UK inhabitants do not consult the web site at all. Most of the active users are likely to be brief and occasional visitors. Economists and sociologists often find distributions that follow the Pareto principle: that twenty percent of the individuals represent eighty percent of the activity. We encounter many such distributions in studies of other library activities. I am willing to guess that we will find the same pattern on the web.
If I am right, eighty percent of the sixty-seven million pages - or about fifty million pages, will have been downloaded by about one million "heavy users". This corresponds to fifty pages (or minutes) per person and year. The remaining thirteen million pages are shared among four million “light users”. Each of them drops in for a brief visit or two, consulting, on the average, three pages a year. Once we get more detailed data on web traffic, we will be able to test whether this "Pareto hypothesis" is correct.
Comparing national libraries
In Norway, the National Library registered xxx physical visits in 2007. The Library did not report on its web traffic to the organization that collects library statistics, but has released the monthly data on page hits from January 2006 to April 2008 (Solbakk, 2008). The annual number of page hits lies around 8 million per year. This implies that the Norwegian level of traffic - as measured by page hits - is nearly twice as high as the British level:
To place these values in perspective, we turn to other national libraries in the Nordic countries. The British NL has the same level of page views as Finland. Norway lies about fifty percent higher while Denmark lies about five times as high!
Denmark
Finland
Sweden publishes data on visits rather than on page impressions. Since Denmark provides both - glory to the Danes! - we can still conclude that Denmark lies three to four times above the Swedish level. We may therefore guess that Sweden lies roughly at the Norwegian level.
Sweden
Denmark
Web traffic: the case of Norway
TNS Metrix dominates the Norwegian market for analysis of web traffic. They publish a weekly summary from more than one hundred organizations, covering three indicators:
The information released by the National Library only gives the number of pages. I will therefore concentrate on the page indicator - with data from Week 20 (= May 12-18, 2008).
To place the traffic in context, we may look at the site's immediate neighbours. These range between one hundred and two hundred and fifty thousand page impressions per week. Just above the National Library we find:
Just below we encounter
These are, we may say, small-scale neighbours. The most popular web sites in Norway attract, in other words, a thousand times as much traffic as the National Library. Other major web services receive several hundred times more attention. Many regional and specialized web sites also attract much more traffic. With the ambitious - and costly - goal of digitizing all its collections within two decades, the Norwegian National Library has set its sight at the new, digital knowledge based society. Involving the general public in this project is still a major task.
I believe this is true in the rest of Europe as well. Libraries that provide access on the web without attracting users on the web have not achieved their goals. .
Conclusion
Libraries of the future will clearly deliver their services in three different environments: physical buildings, local web sites and distributed user environments. The physical library is definitely not disappearing, but it is clearly caught up in a process of profound change. The impetus comes from the outside. When their customers change, they must adapt or fade away. It is the new ways of research, teaching and learning that give the process its momentum.
Public libraries are becoming more physical and more digital at the same time. Academic libraries form an essential part of the academic infrastructure - and move as they move. Academic librarians are taking a new look at their place within the intellectual division of labor. They are finding new ways of supporting research, teaching and learning. Student-oriented libraries redefine themselves as learning centres - for the new types of active, explorative and group-oriented learning that the knowledge economy needs and the web makes possible. Research-oriented libraries take on much more active roles in the research process. The term cyberinfrastructure is clumsy - but covers what I mean.
National libraries face similar challenges. On the web they must shift from collection-based thinking to competitive customer services. This involves a deep change in organizational culture - from the closed world to the infinite universe, to quote Alexandre Koyré.
The speed and uptake of web based practices will of course differ from country to country. But I see the end result as given. The digital revolution will impose its conditions on the world like the industrial revolution did two hundred years ago. We are all moving in the same direction. When we discuss the future of national libraries, we can therefore draw on trends and experiences from many neighbouring fields.
Library statistics tend to focus on information that is easy to collect but hard to interpret. The number of loans does not measure reading or understanding, learning or wisdom. The number of physical visits does not indicate the time spent at, or the benefit gained from, using the library. If we look at web-based versus physical activities, we see that virtual visits are different in kind from physical visits.
During its first decade (1991-2001) the web was mainly used to strengthen existing structures and services. Web sites were typically designed as mirror images of the organization chart: new wine poured into crusty old bottles. During the next decade (2001-2011), web technologies are challenging the structures themselves. When steam replaced sail, many shipping companies were able to adapt. But when airplanes replaced ocean liners, the old entrepreneurs faced a different type of challenge. Their maritime skills belonged to the ocean. They could not take to the air like seagulls. But their knowledge of international trade, transport and logistics was still relevant.
Change is in the air ...
Note
A web version of this paper is available from the TTT web site (choose English). A spreadsheet with supporting and additional data has been published as National library web traffic on Google Docs.
Resources