next up previous contents
Next: Setting the stage, in Up: Goal- and Scope-Definition Previous: Types of data   Contents


Controlling the content

Another important factor, that has to be decided upon in the forefront, is a policy for which material is accepted and actually entered into the collections. Therefore, two different approaches can be discerned:

Controlling the overall consistency and the content of the archive, all digital documents are scrutinised in order to decide, whether or not they are worth to be stored. By performing this selection, a consistent, carefully sorted collection is organised. In a way this approach can be seen as the traditional librarianship applied to the digital domain.

Naturally, performing a selection on material demands guidelines. The definition of such criteria is a matter of the individual project and reflects very much its purpose.

In charge of conducting such a selection procedure is necessarily human personnel. While searching for keywords can be easily realised, making programmes understand the content, the actual meaning of documents is still a visionary task. All the more, when it comes to combining the setting on pictures with written descriptions or any other medium, perhaps even assessments that demand a sense of taste. Therefore, no automatic method exists that is capable applying a stringent selection criteria on the material. Perhaps tools can be developed in the future that facilitate the process, yet, a considerable reduction of the manual labour imposed on the staff cannot be expected in the near future. The required manpower to implement the selection as well as the specification of the policy itself, both restrict the scope of the archive substantially. Thus, initiatives following this approach will focus on small, specific areas.

On the other hand, all the material within the defined scope can be accepted in an unconstrained manner. For collecting the data automatic tools are applied (cf. Section 2.2.2) reducing the required manpower considerably.

With no human scrutinising the collection items, however, the consistency of the collections cannot be completely guaranteed. This concerns possible technical flaws of the collection items, as well as unfiltered content.

Since automatic tools are incapable of judging upon the meanings of documents, material not matching the profile of the archive could be accepted. Among those could even be material which is offensive, disturbing, pornographic, racist, or even prohibited by the law, such as web-sites containing Nazi propaganda or child pornography. Performing manual selection on the documents to be entered into the archive, ensures the quality and the consistency of the collections. However, a selection criteria's legitimacy is questionable as we just do not know what will be important for the future.

A company, for instance, that only stores final documents might lose valuable information in the form of intermediate versions. They could contain information that was deemed not important at first, but turns out to be crucial at a later point in time. Furthermore, a sequence of unfinished papers depicts the emergence along to the refinement of specific ideas and decisions taken in the course of the work. This rich source for analysis and information in general is lost when deliberately deleting documents, albeit considered worthless at that point of time.

A very similar situation is raised in the case of libraries selecting upon which of the publicly available digital material is worth to be preserved as our cultural heritage. Historians working with newspapers preserved from a hundred years ago assess sections very interesting, that are commonly considered worthless such as obituaries, or advertisements. If there had been a selection upon this material, we would never rejoice in possession of this valuable source of information. Of course, the Internet comprises loads of "Sex and Crime", but - whether we like it or not - this is part of our present culture.


next up previous contents
Next: Setting the stage, in Up: Goal- and Scope-Definition Previous: Types of data   Contents
Andreas Aschenbrenner