Input | Output | Run Submission | Training data |
Test data | Contact
Evaluation Measures | Evaluation Results
Topics in this task are sets of claims extracted from actual patent application documents. Participants are asked to return passages that are relevant to the topic claims. The passages must occur in the documents in the CLEF-IP collection. No other data is allowed to be used in preparing for this task.
These sets of claims were chosen based on existing search reports for the considered pantent applications.
The topics (defined below) contain also a pointer to the original patent application file. The content of the xml file (other than the claims selected as topics) can be used as you like.
You can read further clarifications on this task here.
A topic in the 'Claims to Passage' task contains the following sgml codes:
<tid>topic_id</tid> <file>topic_file.xml</tfile> <fam-docs>topic_file.xml</tfam-docs> <claims>xpaths_to_claims</tclaims>
where
<tid>tPSG-5</tid> <tfile>EP-1480263-A1.xml</tfile> <tfam-docs>JP-2003224099-A.xml WO-2003065434-A1.xml</tfam-docs> <tclaims>/patent-document/claims/claim[1] /patent-document/claims/claim[2] /patent-document/claims/claim[3] /patent-document/claims/claim[16] /patent-document/claims/claim[17] /patent-document/claims/claim[18] </tclaims>
The retrieval results should be returned in a text file with 6 columns, as described below (based on the trec formats):
topic_id Q0 doc_id rel_psg_xpath psg_rank psg_score
where:
We allow only one xpath per line in the result files. If more passages are considered relevant for a topic, these have to be placed on separate lines.
The maximum number of lines in the result files is limited to containing 100 doc_ids when ignoring the xpaths.
... tPSG-5 Q0 WO-2002015251-A1 /patent-document/claims/claim 5 1.34 tPSG-5 Q0 WO-2002015251-A1 /patent-document/description/p[22] 6 1.11 tPSG-5 Q0 WO-2002015251-A1 /patent-document/description/p[23] 7 0.87 tPSG-5 Q0 WO-2002015251-A1 /patent-document/description/p[34] 8 0.80 ...
Each participant is allowed to submit up to 8 run files. Each run should be submitted compressed. As in the previous years, the run files should be named using the following schema: participantID-runID-taskID.extension.
   participantID will identify your institution/group
   runID identifies the different runs you submit
   taskID should be PSG
   extension is either tgz, gz, zip or other extension used by compressing programs.
As seen above, the topics contain also a pointer to the original patent application file. Participants to the task are allowed to use the content of this file as they need and see fit, as well as the content of the files in the 'tfam-doc' field.
There are two types of measurements we can compute on the submitted runs: at the document level and at the passage level.
The main measure we will report will be PRES at 20 and 100 cut-offs. PRES rewards systems that return relevant documents earlier in the retrieval list.
In order to apply PRES to the submitted experiments, the experiments will be stripped off of the passage information, the ranking will be kept. For instance, the example run
... tPSG-16 Q0 WO-2000078185-A2 /patent-document/abstract[1]/p 1 2.53 tPSG-16 Q0 WO-2000078185-A2 /patent-document/abstract[2]/p 2 2.2 tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[41] 3 1.89 tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[42] 4 1.75 tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[43] 5 1.5 tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[44] 6 1.02 tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[45] 7 0.9 tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[46] 8 0.8 tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[47] 9 0.7 tPSG-16 Q0 WO-1997007715-A1 /patent-document/abstract[1]/p 10 0.66 tPSG-16 Q0 WO-1997007715-A1 /patent-document/abstract[2]/p 11 0.60 tPSG-16 Q0 WO-1997007715-A1 /patent-document/description/p[43] 12 0.5 tPSG-16 Q0 WO-1997007715-A1 /patent-document/description/p[44] 13 0.42 tPSG-16 Q0 WO-1997007715-A1 /patent-document/description/p[45] 14 0.42 tPSG-16 Q0 WO-1997007715-A1 /patent-document/description/p[46] 15 0.42 ...will be processed into the following:
... tPSG-16 Q0 WO-2000078185-A2 1 2.53 tPSG-16 Q0 WO-1997007715-A1 2 0.66 ...and given as input to the script computing the PRES score. (Note that the psg_score column - the last one - is ignored in the PRES computation.)
The evaluations were made public to the participants. We will soon post the results here as well.
As training data we have prepared the training topics and the test topics used in 2012 to be used as training data in 2013. To the topics used in 2012 we have added the 'tfam-doc' field with pointers to the patent documents that are part of the patent family of the topic document. Where available, we also made these files available.
Download here the training data.
The set of test topics contains 145 topics, 50 in English, 50 in German and 49 in French. You can download it here.
The set of relevance assessments can be downloaded here
When looking at the topics in the training set, you surely have noticed that all XPaths are relative to A level documents (i.e. application document). The same is true for the topics in the test set. This is due to the fact that search reports (almost) allways refer to application documents as relevant citations.
For questions, suggestions and anything else regarding this task, please contact Mihai Lupu (lupu at ifs.tuwien.ac.at) or Florina Piroi (piroi at ifs.tuwien.ac.at)