The Internet Archive11 is a non-profit organisation located in San Francisco, USA. The spear-heading initiative was launched in 1996. By October 2001 its collections comprise more than 100 terabyte of data.
Using libraries as a model, the Archive's mission is to preserve digital collections and to offer permanent access, preventing "born-digital" materials from disappearing into the past. For this reason, it collects Internet sites and other cultural artefacts in digital form and ensures their persistence.
However, collection proceeds in a rather passive way as the Internet Archive relies on donations. Thereby, a comprehensive approach is pursued, since no material is deliberately shut out of the archive. A core contributer has been Alexa Internet, a company that provides information about web-sites and about products on web-pages. In order to maintain its services for navigation on the web, Alexa gathers 100 gigabytes of publicly available data per day, having no restrictions on the scope of the documents whatsoever. Yet, the material is not transferred to the Internet Archive's repository before a period of six months has passed. Thereby, 40 terabyte of the open-access World Wide Web have been acquired. Aside this collection documenting the history of the web, however, other contributers have also donated digital material, e.g. archival movies or a historical documentation of the Arpanet.
These massive amounts of data are stored on tapes, yet, as desktop computers become cheaper it proved a feasible approach to connect several rather small scale hosts to a cluster. To ensure the longevity of the repository, the data is copied to new storage media at least every ten years. By maintaining copies at multiple sites accidents or natural disasters are counteracted. Additionally, software and emulators are collected to promote accessibility of the material in the future.
Access to the wealth of information in the archive is provided at no cost to researchers, historians, and scholars in the scope of projects. For the time being, a certain level of technical knowledge and programming skills is required for using the repository. Amongst others, the Smithsonian Institution, Xerox PARC, AT&T Labs, Cornell, Bellcore, and Rutgers University have made use of this possibility. The projects have engaged in diverse subjects such as the study of human languages, the growth of the Web, and the development of human information habits.