TRIPOD Initial Evaluation Plan

From Chorus
Jump to: navigation, search
TRIPOD Initial Evaluation Plan
http://tripod.shef.ac.uk/outcomes/public_deliverables/Tripod_D5.1.pdf
Author Emma J. Barkera (primary author), Mark Sandersona, Ross S. Purvesb,

Robert J. Gaizauskasa, Gareth Jonesc, Andrew Salwayc

Domain [[Category:evaluation, plan, test collections, users, use scenarios, caption quality,

retrieval effectiveness, usability, tasks, system, functionality, evaluation resources]] evaluation, plan, test collections, users, use scenarios, caption quality, retrieval effectiveness, usability, tasks, system, functionality, evaluation resources

Task
Publisher
Event
Project TRIPOD
Dataset Used
Published 31.05.2009
Copyright Project co-funded by the European Commission within the Sixth Framework Programme (2002- 2006) under GA nr. 045335
DOI


Abstract

In this document we present an initial plan for evaluation in Tripod, focusing on three key evaluation criteria: caption quality, retrieval effectiveness and usability. The plan elaborates on the nature of these criteria and then outlines a set of experiments, in which these criteria may be used to carry out an evaluation of the Tripod services. This plan is organised around a functional analysis of the sub-tasks, inputs and outputs in the four use scenarios for Tripod Services: 1) Tripod caption augmentation; 2) Tripod automated caption content creation; 3) Tripod image retrieval; 4) Tripod multi-lingual caption summary creation. These scenarios represent a slightly modified version of the original four use scenarios presented in the Description of Work, WP1, and the document discusses this new arrangement. We have characterised 12 individual evaluations, each of which forms a part of the overall project evaluation. The evaluations are organised on the basis of the 4 use scenarios and entitled as follows: • Scenario 1: Evaluation of Toponym Recognition and Spatial Preposition Analysis; Evaluation of Location Keywords for Images; Evaluation of Location Keywords for Captions; Evaluation of Simple Image Content Analysis. • Scenario 2: Evaluation of Location Keywords for Images; Evaluation of Multi-document Summarisation; Evaluation of Location Keywords for Captions. • Scenario 3: Evaluating Retrieval Effectiveness in Tripod; User Evaluation of Tripod Image Retrieval. • Scenario 4: Evaluating Information Extraction and Integration from Multiple Multi-lingual Sources; Evaluating the Selection of Content for a Caption; Evaluating Caption Summaries. Some of the evaluation experiments are designed at the level of sub-task in a Tripod Service (e.g. toponym recognition) and will be used to evaluate the performance of Tripod system components. Others have been designed at the level of a Tripod use scenario, e.g. Tripod multi-lingual caption summary creation, and will be used to evaluate a Tripod service (i.e. a full system pipeline). For each evaluation we present a brief outline of relevant research question(s) and methodology and we specify values for a number of attributes, which we define in the plan. The attributes include: “evaluation points”; whether “user visible” or “user transparent”; required evaluation “resources”; relevant Tripod work package “tasks”; “user groups” and “participants”. The set of evaluations included in the initial evaluation plan has been specified on the basis of the scientific and pragmatic priorities of the investigators, and may be modified if new questions arise in the course of research, thus providing a solid yet flexible framework for evaluation within Tripod.

Authors

Main Author(s): Emma J. Barkera (primary author), Mark Sandersona, Ross S. Purvesb, Robert J. Gaizauskasa, Gareth Jonesc, Andrew Salwayc University of Sheffield, University of Zurich, Dublin City University.

Participants: USFD, UZH, DCU, UBA, CU, Ordinance Survey, Centrica, GEODAN, Alinari, TILDE

Citations

Links

Link to the Deliverable : http://tripod.shef.ac.uk/outcomes/public_deliverables/Tripod_D5.1.pdf Project Website: http://tripod.shef.ac.uk/

Personal tools
CHORUS+