SAPIR Deliverable 4.4 - Design of the process of Collaborative Crawling

From Chorus
Revision as of 14:39, 11 July 2011 by Afoncubierta (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
SAPIR Deliverable 4.4 - Design of the process of Collaborative Crawling
Author Claudio Lucchese
Domain
Task
Publisher
Event
Project SAPIR
Dataset Used
Published 05/05/2008
Copyright The research leading to these results has received funding from the European Community’s Sixth Framework Programme (FP6) under grant agreement n° 45128
DOI


Abstract

This report presents the activities conducted within T4.4 of the SAPIR project. It discusses and motivates the push-based crawling framework. This has received attention in the Web search context, but it is much more relevant in the context of this project. Due to the cost, both in terms of network and computation, of centralized pull-based crawling of multimedia objects, we devised some tools that allow to process an object locally at the publisher site, and then to publish only the metadata into the SAPIR distributed indices.

These tools are based on the MPEG-XM reference feature extraction software. We crawled a large portion of the Flickr website in order to obtain high-quality photographic images that are also annotated by users with tags, comments and so son. We thus built a very large collection of image metadata that will be used by all the partners throughout the project. This contains more than 50 million objects, and we plan to further increase its size up to 100 million images. Note that large collections of this kind are not publicly available, and therefore the creation of this collection is fundamental for testing and experimenting the SAPIR approach.

Authors

Main Author(s): Claudio Lucchese

Participants : Raffaele Perego, Fausto Rabitti, Fabrizio Falchi, Maristella Agosti


Citations

Links

Link to Deliverable : http://www.sapir.eu/papers/deliverables/sapir-d4-4.pdf Project Website : http://www.sapir.eu/

Personal tools
CHORUS+