BlogData

From Chorus
Revision as of 15:14, 5 November 2010 by Cimpaniulia (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
BlogData
Domain Blog Posts
Media Text
Size 27 GB
Instances
File Format
Creation Date
Task retrieval
Copyright
URL http://groups.google.com/group/icwsm-data

Domain

  • A set of blog posts, including the posted text, as well as metadata such as the blog's homepage, timestamps, etc

Comments

Media (image, video, mixed, …)

Size (no images, in GB, …)

  • 27GB compressed (142GB uncompressed)

Source (FlickR, Corel)

Annotation type (free text, structured, …)

Ground truth

Event or project

Task (retrieval, recognition, …)

Format

  • 14 tiers of XML documents (44 million blog posts)

Quality (resolution)

Creation date

Copyright

  • Permitted uses in dataset path, file: icwsm-spinn3r.pdf

URL

Personal tools
CHORUS+