TRIPOD Initial Evaluation Plan
http://tripod.shef.ac.uk/outcomes/public_deliverables/Tripod_D5.1.pdf | |
Author | Emma J. Barkera (primary author), Mark Sandersona, Ross S. Purvesb,
Robert J. Gaizauskasa, Gareth Jonesc, Andrew Salwayc |
Domain | [[Category:evaluation, plan, test collections, users, use scenarios, caption quality,
retrieval effectiveness, usability, tasks, system, functionality, evaluation resources]] evaluation, plan, test collections, users, use scenarios, caption quality, retrieval effectiveness, usability, tasks, system, functionality, evaluation resources |
Task | |
Publisher | |
Event | |
Project | TRIPOD |
Dataset Used | |
Published | 31.05.2009 |
Copyright | Project co-funded by the European Commission within the Sixth Framework Programme (2002- 2006) under GA nr. 045335 |
DOI |
Abstract
In this document we present an initial plan for evaluation in Tripod, focusing on three key evaluation criteria: caption quality, retrieval effectiveness and usability. The plan elaborates on the nature of these criteria and then outlines a set of experiments, in which these criteria may be used to carry out an evaluation of the Tripod services. This plan is organised around a functional analysis of the sub-tasks, inputs and outputs in the four use scenarios for Tripod Services: 1) Tripod caption augmentation; 2) Tripod automated caption content creation; 3) Tripod image retrieval; 4) Tripod multi-lingual caption summary creation. These scenarios represent a slightly modified version of the original four use scenarios presented in the Description of Work, WP1, and the document discusses this new arrangement. We have characterised 12 individual evaluations, each of which forms a part of the overall project evaluation. The evaluations are organised on the basis of the 4 use scenarios and entitled as follows: • Scenario 1: Evaluation of Toponym Recognition and Spatial Preposition Analysis; Evaluation of Location Keywords for Images; Evaluation of Location Keywords for Captions; Evaluation of Simple Image Content Analysis. • Scenario 2: Evaluation of Location Keywords for Images; Evaluation of Multi-document Summarisation; Evaluation of Location Keywords for Captions. • Scenario 3: Evaluating Retrieval Effectiveness in Tripod; User Evaluation of Tripod Image Retrieval. • Scenario 4: Evaluating Information Extraction and Integration from Multiple Multi-lingual Sources; Evaluating the Selection of Content for a Caption; Evaluating Caption Summaries. Some of the evaluation experiments are designed at the level of sub-task in a Tripod Service (e.g. toponym recognition) and will be used to evaluate the performance of Tripod system components. Others have been designed at the level of a Tripod use scenario, e.g. Tripod multi-lingual caption summary creation, and will be used to evaluate a Tripod service (i.e. a full system pipeline). For each evaluation we present a brief outline of relevant research question(s) and methodology and we specify values for a number of attributes, which we define in the plan. The attributes include: “evaluation points”; whether “user visible” or “user transparent”; required evaluation “resources”; relevant Tripod work package “tasks”; “user groups” and “participants”. The set of evaluations included in the initial evaluation plan has been specified on the basis of the scientific and pragmatic priorities of the investigators, and may be modified if new questions arise in the course of research, thus providing a solid yet flexible framework for evaluation within Tripod.
Authors
Main Author(s): Emma J. Barkera (primary author), Mark Sandersona, Ross S. Purvesb, Robert J. Gaizauskasa, Gareth Jonesc, Andrew Salwayc University of Sheffield, University of Zurich, Dublin City University.
Participants: USFD, UZH, DCU, UBA, CU, Ordinance Survey, Centrica, GEODAN, Alinari, TILDE
Citations
Links
Link to the Deliverable : http://tripod.shef.ac.uk/outcomes/public_deliverables/Tripod_D5.1.pdf Project Website: http://tripod.shef.ac.uk/