Migrating Content in WARC Files

S. Strodl, Peter Beran, A. Rauber:
"Migrating Content in WARC Files";
Vortrag: The 9th International Web Archiving Workshop (IWAW 2009), Corfu, Greece; 30.09.2009 - 01.10.2009; in:"The 9th International Web Archiving Workshop (IWAW 2009) Proceedings", (2009), S. 43 - 49.

[ Publication Database ]

Abstract:


Heritage institutions all over the world started on harvesting
and preserving resources of the World Wide Web for future
generations as part of our culture heritage. This task tends
to be a non-trivial one because of two complex challenges:
(1) crawling the enormous data amount located in the Internet
and (2) performing long term preservation strategies on
these data. Nowadays a lot of effort is made in the development
ofWeb crawlers and there exist many years´ experience
with bit storage of large data amounts. However the support
for the logical preservation of Internet archives is very limited.
The continuous development of technologies that are
used in the Web and especially the rapid change in using a
tremendous variety of different file formats put the digital
assets in the Web archives at risk of becoming inaccessible
and unusable in the near future.
This paper presents a workflow to apply digital preservation
strategies on the content of WARC archives. The migration
of the objects within a WARC archive allows accessing and
using the information in the future. The new WARC format
that is widely used to store Internet crawl results supports
migration of its content. Moreover a set of tools is presented
that supports the extraction, migration and injection of objects
in WARC files.