Updates on the Claims to Passages Task

May 9, 2012
We have found some errors in the training data qrels. Please download them again.
Also, we have written down some clarifications about this task, if you have further questions please send them to us.

May 4, 2012
The training data for the Claims to Passage Task is available together with a description of the task.

April 29, 2012
We are finalizing the set of training data. We will make it available in the following days - together with specific guidelines.

Flowchart training data available

You will find some detailed information about the task involving flowcharts here.
The test data will be made available beginning of June.

Data partially available

The corpus of patent documents is available to download.
Note that CLEF-IP 2012 uses the same patent documents as CLEF-IP 2011.

We are working on creating the training and the test topic sets, as well as the guidelines.
Some trailer-like information: for the passage retrieval we plan on involving XPath in the expected results. We are also very close to finalizing the textual representation of the flow charts, representation later used in the evaluation of the task.

Registration is open!

See here how to register to CLEF-IP 2012

CLEF-IP

Retrieval in the Intellectual Property Domain

The CLEF-IP track was launched in 2009 to investigate IR techniques for patent retrieval and was part of the CLEF 2009 evaluation campaign.  In 2010 and 2011, the track was organized as a benchmarking activity of the CLEF 2010 and 2011conferences.

The project is supported by the PROMISE Network of Excellence (co-funded by the 7th Framework Programme of the European Commission).

The image tasks are supported by the IMPEX project (funded by the Austrian Research Promotion Agency - FFG).

logo logo

Tasks in 2012

  • Passage retrieval starting from claims (patentability or novelty search): The topics in this task will be based on the claims in patent application documents. Given a claim, the participants will be asked to retrieve relevant documents in the collection and mark out the relevant passages in these documents.
  • Matching claim to description in a single document (Pilot): Given one claim in a patent application document, the participants will be asked to indicate those paragraphs in the description section of the same application document that best explain the contents of the given claim.
  • Flowchart Recognition Task: The topics in this third task are patent images representing flow-charts. Participants in this task will be asked to extract the information in these images and return it in a predefined textual format.
  • Chemical Structure Recognition Task. The topics in this fourth task will be patent pages in TIFF format. Participants will be asked to identify the location of the chemical structures depicted on these pages and, for each of them, return the corresponding structure in a MOL file (a chemical structure file format).

More details about each of the tasks will be made available soon.

Timeline (provisional)

The provisional time line of the track is:
  • Data release - End of February/March 2012
  • Training data release - April 2012
  • Topic release - April 2012
  • Submission deadline - June 2012
  • Evaluation results release - July 2012
  • Workshop at CLEF - September 2012