BlogData
From Chorus
Revision as of 15:14, 5 November 2010 by Cimpaniulia (Talk | contribs)
Domain | Blog Posts |
Media | Text |
Size | 27 GB |
Instances | |
File Format | |
Creation Date | |
Task | retrieval |
Copyright | |
URL | http://groups.google.com/group/icwsm-data |
Domain
- A set of blog posts, including the posted text, as well as metadata such as the blog's homepage, timestamps, etc
Comments
- The tiers are organized to approximate to some degree search engine ranking
- "Google group: http://groups.google.com/group/icwsm-data Google code: http://code.google.com/p/icwsm-data/"
Media (image, video, mixed, …)
Size (no images, in GB, …)
- 27GB compressed (142GB uncompressed)
Source (FlickR, Corel)
Annotation type (free text, structured, …)
Ground truth
Event or project
Task (retrieval, recognition, …)
Format
- 14 tiers of XML documents (44 million blog posts)
Quality (resolution)
Creation date
Copyright
- Permitted uses in dataset path, file: icwsm-spinn3r.pdf