CLEF-IP has ran for the first time in 2009. Since then, the document corpus has increased, the number and types of tasks has been modified. Below some details about the earlier CLEF-IP challanges are listed.
Much of the information below can be found on the previous CLEF-IP website.
The documents in the patent collection are stored as XML files.
The documents are derived from European Patent Office and have mixed content in English, German and French.
The files contain bibliographic data as well as descriptive text. The XML files are quite comprehensive, containing detailed
information on inventors, assignees, priority dates etc. From the variety of information in the XML files, these are the elements you should start to look at:
There was only one kind of task: find documents that constitute prior art. 10.000 topics were made available, participants could choose to submit experiments using subsets of the largest topic set. Accepted subsets had to contain results for the first 500, 1000, or 5000 topics out of the complete set.
The language of the topic documents was not restricted. The 2009 track also made available optional language tasks for English, German and French, where the topics had textual content in one of the three languages, only.
Two kinds of tasks were available:
Both tasks contained 2000 topics, participants to the Prior Art task were allowed to submit results for a smaller topic set of 500 topics.
There were four tasks in the 2011 track:
Three tasks were organized in 2012:
Relevance judgements are produced by an automatic method using patent citations from seed patents.
In 2009, for a small number of queries, (pooled) search results were reviewed by Intellectual Property experts.
In 2009, relevancy was measured on patent-level not on patent-document level.
That is, a relevant item is a patent, not a patent file (or document).
A patent is identified by its patent ID. This means that a valid result is
of the form EP0383071 rather than EP0383071-B1.xml or EP-0383071-B1 (which are document ids).
Note that this patent-level relevancy can be applied for EP patents where the
patent number/ID appears in every patent document in the data set and identifies
a patent univocally. This may not be the case for publications from
other patent offices - a typical example being the USPTO.
In 2010 relevancy was measured at the document level, in 2009 and 2011 the relevancy was measured at patent level.