The iPRES 2010 tutorials will take place on Sunday, September 19 at the Vienna University of Technology.
JHOVE2 is a Java framework and application for next-generation format-aware characterization of digital objects [1]. Characterization is the process of deriving representation information about a formatted digital object that is indicative of its significant nature and useful for purposes of classification, analysis, and use in digital curation, preservation, and repository contexts. JHOVE2 supports four specific aspects of characterization: (1) identification, the determination of the presumptive format of a digital object on the basis of suggestive extrinsic hints and intrinsic signatures; (2) validation, the determination of the level of conformance to the normative syntactic and semantic rules of the object's format; (3) feature extraction, the process of reporting the intrinsic properties of an object significant for purposes of classification, analysis, and use; and (5) assessment, the determination of the level of acceptability of an object for a specific purpose on the basis of locally-defined policy rules.
The object of JHOVE2 characterization can be a file, a subset of a file, or an aggregation of an arbitrary number of files that collectively represent a single coherent digital object. JHOVE2 can automatically process objects that are arbitrarily nested in containers, such as file system directories or Zip files.
The JHOVE2 project is a collaborative undertaking of the California Digital Library, Portico, and Stanford University, with generous funding from the Library of Congress. Additional information about JHOVE2 can be found on the project wiki [2]. JHOVE2 is made freely available under the terms of the BSD open source license.
The JHOVE2 project seeks to build on the success of the original JHOVE characterization tool [3] by addressing known limitations and offering significant new functions, including: streamlined APIs with increased modularization, uniform design patterns, and comprehensive documentation; object-focused, rather than file-focused, characterization, with support for arbitrarily-nested container formats and formats instantiated across multiple files; signature-based file-level identification using DROID [4]; aggregate-level identification based on configurable file system naming conventions; rules-based assessment to support determinations of object acceptability in addition to validation conformity; extensive user configuration of plug-in modules, characterization strategies, and formatted results using the Spring dependency injection framework [5]; and performance improvements using Java buffered I/O (java.nio).
The main topics covered during the tutorial are: the role of characterization in digital curation and preservation workflows; an overview of the JHOVE2 project: requirements, methodology, and deliverables; demonstration of the JHOVE2 application; architectural review of the JHOVE2 framework and Java APIs; integration of JHOVE2 technology into existing or planned systems, services, and workflows; third-party development of conformant JHOVE2 modules; and building and sustaining the JHOVE2 user community.
This tutorial is an updated and expanded version of the workshop presented at iPRES 2009 in San Francisco [6], which attracted over 40 registrants. This tutorial will closely follow the production release of JHOVE2 and will incorporate significant new material arising from the second year of project work.
The targeted audience for the tutorial includes digital curation, preservation, and repository managers, analysts, tool users and developers, and other practitioners and technologists whose work is dependent on an understanding of the format and pertinent characteristics of digital assets.
References:
The PREMIS Data Dictionary for Preservation Metadata is a specification that provides a key piece of infrastructure for digital preservation activities, playing a vital role in enabling the effective management, discovery, and re-usability of digital information. Preservation metadata provides provenance information, documents preservation activity, identifies technical features, and aids in verifying the authenticity of digital objects. PREMIS is a core set of metadata elements (called "semantic units") recommended for use in all preservation repositories regardless of the type of materials archived, the type of institution, and the preservation strategies employed. This tutorial provides an introduction to PREMIS and its data model and an examination of the semantic units in the Data Dictionary organized by the entities in the PREMIS data model, objects, events, agents and rights. In addition it presents examples of PREMIS metadata and a discussion of implementation considerations, particularly using PREMIS in XML and with the Metadata Encoding and Transmission Standard (METS). It will include examples of implementation experiences.
The PREMIS Data Dictionary was originally developed by the Preservation Metadata: Implementation Strategies (PREMIS) Working Group in 2005 and revised in 2008. It is maintained by the PREMIS Editorial Committee and the PREMIS Maintenance Activity is managed by the Library of Congress.
The tutorial aims at developing and spreading awareness and knowledge about metadata to support the long term preservation of digital objects. The tutorial will benefit individuals and institutions interested in implementing PREMIS metadata for the long-term management and preservation of their digital information but who have limited experience in implementation. Potential audience includes cultural heritage operators, researchers and technology developers, professional educators, and others involved in management and preservation of digital resources.
PRESENTERS:
The rapid technological changes in today's information landscape have considerably turned the preservation of digital information into a pressing challenge. The aim of an institutional repository has evolved in the last decade from the simple need to provide material with a persistent online home, to an infrastructure that facilitates services on complex collections of digital objects.
Digital librarians have long acknowledged the preservation function as a vital back office service that is central to the role of repository. However, preservation is often sidelined due to the practical constraints of running a repository. Dealing with institutional-scale ingests and quality assurance with minimal staff and investment rarely leaves sufficient capacity for engaging with a preservation agenda. A lot of different strategies, i.e. preservation actions, have been proposed to tackle this challenge: migration and emulation are the most prominent ones. However, which strategy to choose, and subsequently which tools to select to implement it, poses significant challenges. The creation of a concrete plan for preserving an institution's collection of digital objects requires the evaluation of possible preservation solutions against clearly defined and measurable criteria.
This tutorial shows attendees the latest facilities in the EPrints open source repository platform for dealing with preservation tasks in a practical and achievable way, and new mechanisms for integrating the repository with the cloud and the user desktop, in order to be able to offer a trusted and managed storage solution to end users.
Furthermore, attendees will create a preservation plan on the basis of a representative scenario and receive an accountable and informed recommendation for a particular preservation action. The whole preservation planning process will be supported by Plato , a decision support tool that implements a solid preservation planning approach and integrates services for content characterisation, preservation action and automatic object comparison to provide maximum support for preservation planning endeavours. Attendees will then enact the preservation plan created in Plato by uploading it to the EPrints repository. By uploading the preservation plan EPrints automatically carries out the recommended preservation action, e.g. migrating all GIF images in a repository to PNG, and links the plan to both the original and the migrated file.
The benefit of this tutorial is the grounding of digital curation advice and theory into achievable good practice that delivers helpful services to end users for their familiar personal desktop environments and new cloud services.
PRESENTERS:
Today more and more of our lives are becoming digital. Everything from family photographs, music files, video footage, and correspondence to medical records, bookmarks, documents, and even ideas are now available in electronic form. This makes access quick & convenient, but how do we save all of these digital assets for the long term? Most of us have experienced personal data loss at one time or another due to hard drive failure, file corruption, technology obsolescence, or accidental file deletion. What should we be doing right now to safeguard our digital creations? This hands-on session will explain the process of creating and executing an action plan for archiving personal digital assets, deciding what to store, consolidating multiple file versions, and cataloguing resources. This workshop will explore both local storage media and cloud services as well as institutional & disciplinary repositories. Learn to plan & execute the archiving of your own personal digital assets as well as how to teach your patrons to do this for themselves.
Within research activity worldwide about digital preservation many studies, criteria sets, tools, strategies, standards and best practices have been developed by the practitioners: one of these technology families is the Persistent Identifier (PI) to grant stability of digital objects over time. PIs give things that we use or talk about in information systems a unique and stable name. While the location of a resource may change, its PI remains the same. Persistent identification of Internet resources is a crucial issue for almost all the sectors of the future information society. In particular, in the cultural/scientific digital library applications, the instability of URLs reduces the credibility of digital resources which is a serious drawback especially for researchers.
There are various concepts and schemes for persistent identification that pretend to solve this problem: Digital Object Identifier (DOI) [7], Persistent Uniform Resource Locator (PURL) [5], Archival Resource Key (ARK) [2] and Uniform Resource Name (URN) [6] to name a few. They all share common goals but there are indeed important differences between these approaches with respect to the use cases, communities and business models towards they are directed. Recently the diversity of possible solutions is getting even more confusing: The PI systems mentioned above all primarily focus on the identification of web resources that are meant to be available in the long term and are subject to long-term preservation. But with the raise of the Data Web, which is driven by the success of social networks and the Linking Open Data movement, the identification of non-digital entities (like real-world objects, events, places and persons) and abstract concepts is getting more and more important. Especially in this context the traditional PI systems compete with lightweight solutions like "Cool URIs" [3] and Hashtags.
But the key qualities of a PI service are mostly independent of the scheme it uses. They concern trust and reliability. No technology can grant a level of service in any case without a trustable organisation and clear defined policies: it is well known that digital preservation is more an organisational issue than a technical one. European activities, like the development of the Europeana Resolution Discovery Service [6] and PersID [1], focus on the harmonization of the national PI strategies and embed all these existing approaches into a shared infrastructure. The aim is to establish a transparent and trusted service for the cultural and academic sector. The crucial question is: How much und what kind of regulation by public authorities does the web of culture and research need?
In this tutorial we explain the importance of trusted Persistent Identifier services for the web's evolution and present a survey of available technologies and current practices. The tutorial starts with introduction of the problems PI systems try to solve today and those that they will have to address in the future. Then we will present a survey of available technologies and the major initiatives world wide, talk about their commonalities and differences and highlight the most important issues and problems with the current situation. More in detail the Europeana Resolution Discovery Service (ERDS) and the PersID goals and plans will be outlined. The tutorial will close with an open debate or round table on "use cases and user requirements for a PI system".
The tutorial is directed towards people in charge of digital repositories, institutions working in the context of linked data, authors of digital contents, software companies developing archival solutions and digital library applications, researchers and students working on digital libraries for cultural and scientific resources.
References: