by Brinley Franklin , Terry Plum  and Martha Kyrillidou 
University of Connecticut Libraries, Storrs, CT, USA.
The refinement of e-metrics by librarians to evaluate the use of library resources and services has matured into a set of standardized tools and shared understandings about the value of the metrics for making data-driven, managerial decisions in libraries. E-metrics are applied to a number of library resources and service domains, some as census counts and others as samples. Census counts include the statistics of usage of networked electronic resources collected by external vendors conforming to standards, such as COUNTER (Counting Online Usage of Networked Electronic Resources) and SUSHI (Standardized Usage Statistics Harvesting Initiative), also developed by external to the library organizations.
Locally developed census counts, that are generated from click-through scripts, rewriting proxy server logs, VPNs, or Open URL server logs, capture data of networked electronic resource usage at the local level. Unlike external vendor-supplied data, these local data can be mapped against authenticated users or IP to determine usage by local demographics, for example, client group, school or discipline. Library web sites are routinely evaluated by web server logs and web traffic analysis software. Electronic resource management applications can tie cost to usage, while enhancing access to e-journals. Interlibrary loan usage is counted by OCLC's ILLiad and Infotrieve's Ariel. Finally the OPAC can collect usage data including search statements and success rates.
Sample data on library resources and services collected through web surveys of client groups are common. Such locally implemented sampling plans include LibQUAL+, a suite of services to measure user opinions about service quality; MINES for Libraries, a web based survey determining the purpose of use by affiliation and status of user for networked electronic resources and services, and web usability studies to improve web presentation and architecture. LibQUAL+ benchmarks results to contextualize user opinions as compared to other similar libraries. Finally, libraries track budget data for various cost centers, generating cost/use or cost/benefit data for both resources and services.
These data inform managerial decisions in libraries. For example, collection expenditures can be analyzed by cost/usage data, and mapped to client group. Administrative use of usage metrics can provide performance metrics to measure services, such as virtual reference, or new web 2.0, social networking services.
These commonly accepted tools provide an assessment culture well known to librarians in academic and research libraries. Although now the focus of interest by information scientists, computer scientists, and funded research granters, digital libraries fit more comfortably into the librarian world view of library service. Digital libraries are libraries and are informed by expectations of service for libraries. This article examines how assessment data are presently being used in libraries. It will develop a set of criteria or expectations for the evaluation of digital libraries, based on the existing assessment culture in libraries and the uses to which assessment data are employed. This chapter examines the usefulness of those empirical evaluation methods presently in use in libraries to assess the acceptance of empirical evaluative methods for digital libraries.
by Michael Khoo , Lee Zia  and David McArthur 
 The iSchool at Drexel,
College of Information Science and Technology, Drexel University
In addition to collecting and analyzing evaluation data, digital libraries also have to communicate these data to funding agencies and other external partners, where they can be used to support further project research, development, and sustainability. This communication is an important dimension of evaluation work, and evaluators are, theoretically, well-placed to support it: they have expertise in how digital libraries work technically and socially; they have a good understanding of evaluation questions and issues; and they have knowledge of funders' requirements for evaluation that will help with both planning and if necessary then ‘translating' evaluation data for external audiences. Acquiring knowledge of funders' requirements for evaluation – and having gained it, then acting on it – can however be difficult, for a variety of technical and pragmatic reasons.
We will illustrate these issues through lessons learned from evaluation work with the U.S. National Science Digital Library (NSDL: nsdl.org). The NSDL is a multi-year National Science Foundation (NSF) program that has funded over 300 individual projects to develop and integrate a wide range of science, technology, engineering and mathematics (STEM) educational resources, tools and services. From NSF's perspective, NSDL evaluation takes place on two scales: the individual NSDL projects that conduct research and development and build the library, and the wider NSDL program that supports the projects and organizes them into a coherent portfolio. This approach is concerned less with the detailed individual project evaluation questions and methods, so much as whether projects adhere to standards to enable aggregation of results across projects, such as the introduction of standardized cross-program web metrics, which will allow NSF to track the growth of the NSDL both at the project and program level. In addition to these project- and program-level evaluations, NSF is also interested in broader issues – such as the value of the library as a whole, or the accumulation of knowledge over the course of the program – that cannot be answered by simply aggregating individual project results. While the NSF itself has clear evaluation goals, in practice, from the perspective of some NSDL projects, these goals were sometimes not obvious. Further, even where these goals were obvious, they were sometimes not appropriate for a particular project, either for logistical reasons – for instance, newly-funded projects might have more immediate development priorities than implementing web metrics, such as building a web site in the first place – or for a range of practical reasons, such as limited evaluation resources (budgets, personnel, etc.).
In this chapter, we will discuss both similarities and differences in perceptions of evaluation in the NSF and individual NSDL projects, and the role that these similarities and differences played in shaping overall NSDL evaluation outcomes.
This chapter discusses the criteria of usability evaluation in digital libraries, including useableness, usefulness, effectiveness, efficiency, satisfaction, learnability, memorability, ease of error recovery, ease of use, ease of navigation, understandability, and appropriate level of interaction. The frameworks of usability are discussed, including Thomas' categorization that usability attributes may be divided into outcome, process and task; and Saracevic's division into content, process, format, and overall assessment. In addition, the paper discusses cross-cultural usability and usability for special groups of users such as the elderly. An outlook and a suggestion for future research are provided at the end of the chapter.
by Manolis Garoufalou , Rania Siatri  and
Richard Hartley 
Use, users and usefulness are intertwined concepts of information science. Use takes place within hierarchical contexts based on the longitude and the intensity of information management systems. Both microscopic and macroscopic views of information management, as well as active and inactive approaches, view information interaction under different perspectives. Furthermore the exploration of the users' behavioral aspects has shifted, together with advancements in human-computer interaction, the direction of system-centered attention towards user-centered examination of interaction with information retrieval systems and digital libraries. Finally usefulness encapsulates various abstractions of content relevance that satisfies primary information needs and promotes work tasks.
The proposed contribution aims to cover significant areas that touch these concepts, such as information behaviour and user studies. It will present the impact of information behaviour on the design of critical features that aim to improve system operation and enhance user performance. It will also investigate the role of user studies in the evaluation of digital libraries, as means of understanding in comprehensive fashion the user requirements and conceptualization of relevant and adjustable information. Finally the deconstruction of users' information processing based on task analysis and behavioral patterns will be presented to inform the readership about the development of robust stereotypes that promote the design of effective information systems.
by Maristella Agosti and Nicola Ferro
The chapter discusses the evaluation methodologies of the performances of a DL in order to fit them to the new way of approaching the development of this type of systems.
As observed by [Ioannidis et al., 2005], DL development must move "from an art to a science" in order to give rise to "industrial-strength" DLs, based on reliable and extensible services. This shift in DL development and the requirement for improved reliability points out, among other issues, the need of proper evaluation methodologies in order to assess DL performances along different dimensions. In the light of this scientific development, the evaluation methodologies should not be perceived as something external to the design and development process of a DLMS but, on the contrary, they should be tightly integrated into it.
Therefore, the evaluation itself of a DL turns out to be a scientific activity whose outcomes, such as performance analyses and measurements, constitute a kind of "scientific data" that need to be properly considered and used for the design and development of DL components and services. When we deal with scientific data, "the lineage (provenance) of the data must be tracked, since a scientist needs to know where the data came from... and what cleaning, rescaling, or modeling was done to arrive at the data to be interpreted" [Abiteboul et al., 2005]. In addition, [Ioannidis et al., 2005] point out how "information enrichment" should be one of the activities supported by a DL and, among the different kinds of it, considers provenance as "important in judging the quality and applicability of information for a given use and for determining when changes at sources require revising derived information".
Interestingly enough, this line of reasoning highlights a kind of intrinsic circularity when we move the DL development to a science. In fact, the achievement of the necessary levels of reliability and effectiveness calls for an extensive use of evaluation methodologies which, as a result, produce a considerable amount of scientific data. These scientific data should, in turn, be managed by a DL which takes care of supporting their enrichment and interpretation, in order to yield the expected positive feedback on DL design and development.
Therefore, we will discuss the current evaluation methodology for DL performance in the light of a proper management of the produced scientific data in order to ensure their interpretation, enrichment, and future re-use. Moreover, we will discuss how a DL could be the correct system for managing these kinds of data.
Moreover, if we consider a DL constituted by different services, we could envision a more general DL devoted to the management of evaluation outcomes, where each service deals with a specific aspect of the evaluation of a DL. In this context, the evaluation of the performances represents one of the possible services of such DL.
by David Nicholas and Paul Huntington
The chapter points out the value of usage data in keeping digital libraries effective, responsive and on track. Describes what deep log analysis involves and its advantages over classic transactional log analysis, surveys and qualitative methods. Shows what data can be generated by the form of analysis, by referring to the results of five years work researching the virtual scholar in regard to their usage of e-journal, e-book and e-learning databases. In particular the chapter will refer to work from recent (2007) CIBER investigations of Institute and BL Learning.
by Ann Blandford  and David Bainbridge 
In the development of digital libraries, there is a constant tension between a system-led approach of building that which is possible and a user-led approach of building that which is needed. There are many factors influencing the design and user experience of digital libraries, including traditions in librarianship and publishing, technology conventions (e.g. interoperability standards), user expectations and users' information needs.
In this chapter, we focus on how to establish user needs for digital libraries, recognising that those needs arise in the context of their broader information tasks (such as writing a paper, planning a course of medical treatment or preparing a legal case). We also consider the role of innovation in creating new possibilities that may address previously unrecognised user needs. These different forces result in a co-evolution of design and use that can result in more usable and useful digital libraries, and more competent information users.
In the first section of this chapter, we describe approaches to gathering user needs, based on the premise that it is important to understand what users currently do when looking for information in digital libraries. We illustrate this with an example drawn from a study of Masters students finding information resources while preparing their individual dissertations.
In the second section, we outline the design of innovative tools to support scholars accessing music aided by content-based analysis. Music data is represented symbolically, and the interface - accessed through a digital library - allows for users to sing, hum or play on piano keyboard digitally displayed on the screen in addition to more traditional textual based methods.
In the third section, we discuss approaches to evaluating extant digital libraries, alone and in their contexts of use. Evaluation can cover many angles, many of them addressed in other chapters of this book. Here, we focus particularly on evaluating usability and the user experience of working with the library. Here, we illustrate the discussion with the example of the music digital library that was introduced in the previous section.
Finally, we draw the threads together to review design and evaluation processes for digital libraries, and their interrelationships.
Christos Papatheodorou and Giannis Tsakonas
Digital libraries support the workflow of several communities with intense demands, strong requirements and changing behaviours. Except from satisfying informational needs and affecting the information user's skills and competencies, digital libraries' operation is having a serious effect on the operation of fathoming organizations and institutions. Outcomes extend to different levels that affect the individual's life quality, the institutional state and progress and the societal advancements. According to the literature outcomes are not constraint to the above, but can cover several other important aspects of the life of users, such as economic, learning and cultural, and outcomes assessment is severely influenced by the context wherein takes place. This subtle research area uses often interchangeably terms like outcomes and impact, however the use of these terms are practically imposed by contextual conditions. While outcomes are related to educational processes, impact is linked with the scientific production of research communities.
The current contribution aims to explore the implications of digital libraries on the assessment of the outcomes of influential processes, like education and research. The aims of this contribution include the definition of the escalation from interesting and target outcomes to real and applicable. It will also summarize methodological aspects of the outcomes assessment process through the presentation of exemplar projects and case studies of innovative approaches from both sides of the Atlantic Ocean. Several methodologies, such as statistics, surveys, document examination etc, apply to specific aspects of outcomes assessment. However it is the purpose of this contribution to suggest the merging of methods to address more diverse needs as they are ensued from the digital environment. Even though significant challenges for the generation of solid assessment reports are raised, the digital environment also enhances already known methodologies, which are based on statistic indicators, and assists the creation of new assessment methodologies. In spite of knowing in several cases the intended outcomes, which are interweaved with the operation of well-known organizations and institutions, digital libraries pose significant questions regarding the identification, the validation and the definition of their extent in this new environment.
by Martha Kyrillidou , Colleen Cook  and
Yvonna Lincoln 
Research in digital library evaluation has surfaced a variety of models for evaluating the performance of digital libraries. This chapter discusses research for the user community of Science, Math, Engineering and Technology Education Digital Library (NSDL), and the research conducted in developing a protocol known as DigiQUAL®. Our goals were to:
Evaluating digital library service quality using a standardized total market survey is based on users' perceptions and expectations. Research has demonstrated that library users perceive library service quality on different levels - they perceive library service quality holistically and simultaneously on a more detailed level that embraces the separate dimensions of empathy, place, collections, reliability, and access. Translating and validating these dimensions into the digital library environment is a formidable challenge. As cyberinfrastructure needs are identified and new digital library environments developed, including those formed by traditional research libraries in the form of institutional repositories, the need to understand success in the digital library environment is growing.
by Michael Khoo  and Sarah Giersch 
Planning a digital library evaluation initiative is a complex process that integrates a wide range of activities, including:
To make most efficient use of (often limited) evaluation resources,
projects have to select from and coordinate amongst these variables before
beginning any evaluation work. However, what choices should be made?
The options can be overwhelming, especially for projects embarking on
evaluation for the first time.
by Maria Monopoli
The general questions in evaluations are: Why evaluate? What to evaluate? How to evaluate? This chapter is focused on the how approach and specifically on the research methods proposed in order to carry out a qualitative evaluation of digital libraries. Qualitative analysis is associated with empirical information about the world in the form of words. This valuable data can be used to better understand any phenomenon. Or they can also be used to gain new perspectives on issues about which much is already known, or to gain more in-depth data that may be difficult to convey quantitatively. Also, qualitative methods are appropriate in situations where one needs to first identify the variables that might later be tested quantitatively, or where the researcher has determined that quantitative measures cannot adequately describe or interpret a situation.
In terms of digital libraries the goal of qualitative research is to identify and define users' perceptions, opinions and feelings about digital libraries. Users are invited to express their opinions on a number of issues related to digital libraries and researchers to probe and explore their views. These issues might deal with the usefulness, usability or performance of digital libraries.
The most prevailing forms of collecting data about the Information Search Process (ISP) of users are: interviews, observations and focus groups. Information Search Process refers to users' cognitive and operational aspects before, during and after searching a digital library. Specifically, analysing users' perceptions, opinions and feelings during every searching stage can provide valuable qualitative data, such as information about usability issues, about the quality of information provided to users or about the performance of the system regarding recall, precision, relevance and response time. Most recent studies have shown user-logging data as an important method of qualitative evaluation. The great advantages of transaction logs are not simply their size and reach, but also the fact that they are direct and immediately available records of what users have actually done and not what they say they might or would do. But, most importantly user-logging data can be used for much more than generating usage statistics. For example, information about users' navigation choices could be used to inform decisions about page design and layout. Or error rates and user actions to recover from errors may provide useful information about the skill level of typical users, and this information might influence future decisions about interface design. Or determining the structure of relations among documents retrieved from users and analysing these relationships might give valuable data about the usefulness of digital libraries or the structure of DL user community.
Another source of information that can be invaluable to qualitative researchers is the analysis of existing documents, such as personal diaries, emails or transcripts of conversations that describe users' impressions and reactions regarding digital libraries.
by Yin-Leng Theng
Even though empirical evaluation is essential and necessary to the design of good digital libraries, real users can be difficult and expensive to recruit to test all aspects of several versions of a system. Further, there is also a lack of practical evaluation techniques that designers can use to design and build more usable digital libraries while meeting the time constraints of the design process (Theng, 2005). Many have argued that it can be too slow and costly for the financial and time constraints of the design process. Landauer (1995) points out that it is not good enough to design an interactive system without subjecting it to some form of evaluation, because it is impossible to design an optimal user interface in the first try. Dix et al. (1997) argue that even if one has used the best methodology and model in the design of usable interactive system, one still needs to assess the design and test the system to ensure that it behaves as expected and meets end-users' requirements. Nielsen's (1993) advice with respect to interface evaluation is that designers should simply conduct some form of testing.
Triangulation of findings from different perspectives using a combination of qualitative and quantitative evaluation techniques provides a more holistic assessment of the usability and usefulness of interactive systems. Insights from qualitative evaluations are beneficial in helping us to understand the reasons why problems occur. But, using established qualitative usability techniques requires a competent level of “craft skills”, even with the most commonly used Nielsen's Discount Usability Engineering. Quantitative evaluations, on the other hand, can be designed to help designers compare and evaluate the effectiveness of systems, and designers need robust, quantifiable metrics.
This chapter focuses on quantitative usability evaluations of digital libraries and related supporting environments such as online communities and mobile devices, and describes four studies (Theng, Tan, Lim, Zhang, Goh, Chatterjea, Chang, Sun, Han, Dang, Li and Vo, 2007; Theng and Lew, 2006; Theng, Goh, Lim, Liu, Ming, Pang and Wong, 2005; Khoo, Chan, Theng and Buddharaju, 2004), to illustrate quantitative techniques used to investigate users' perceptions on the usability and usefulness of these systems conducted at appropriate stages of the software development cycle. Underlying theories and models, inspired from computer science and social science, underpinning user interactions will be used in the design of the studies, and the construction of the survey instruments. Hypotheses testing and advanced statistics such as exploratory and confirmatory factor analysis will be employed in the analysis of findings applied in identifying clusters of useful interaction design elements, and factors affecting users' perceptions and acceptance of digital libraries.
The paper concludes with a discussion on the importance of carrying out empirical studies using real users, and proposes future work exploring research in executable user models to reduce the use of extensive and time-consuming real users testing the systems. The idea is to automatically generate executable cognitive user models to simulate a real user's behaviour as a cost-effective means to rapidly iterate and test design, without the attendance of real users, to enable designers to detect usability problems in their systems more quickly than it is currently possible with traditional user modelling techniques. Executable cognitive user models are software agents that simulate real end-users' behaviour, as well as predict end-users' performance (Miao, Goh, and Yang, 2002). They can embed multi-disciplinary knowledge that most designers and end-users would not be expected to know or be able to verbalise in their accounts of interaction. They are able to do more exhaustive checking of prototypes, long before they have reached a stage where actual human-user interaction would be practicable. Because user models are reusable, it is possible to simulate rapidly large groups of end-users and obtain useful statistical information. The idea in using executable models achieves a two-fold purpose in rapidly iterating the design process and avoiding many design blunders. However, the reliability and efficiency of the executable cognitive user models is very much dependent on the cognitive theories used to generate them (Barnard and May, 1993). It is, therefore, important that the results obtained from simulating executable user models be sufficiently tested for them to be reliable (Wilson and Clarke, 1993).
|Contact: Giannis Tsakonas / Christos
Last Update: December, 2009