A System that Learns to Tag Videos by Watching Youtube
The automatic detection of semantic concepts like objects, actions, or locations in video is usually tackled by manually annotating a set of training videos. Instead, our prototype downloads its training set automatically from online video portals (in this case, youtube). This avoids the need of manually annotating a high-quality dataset – the only interaction required by the user is to type in the concept to be learned.
This application demonstrates the performance of our system for a dataset of 2200 online videos.
How-to
|
|
|
|
Dataset For this demo, 22 tags were chosen (see below), and for each tag 100 videos were downloaded. 50 videos were used for learning the tags (they are hidden to you), and 50 videos were used in the test set (you can browse them in the demo). This makes a total of 194 hrs. of online video material.
Tags We chose a set of 22 representative tags that includes activities (“riot”), objects (“eiffeltower”), and locations (“desert”). Since videos are organized in categories at youtube, we download videos from a canonic category for each tag (e.g., the category “sports” was used for “soccer”). Here is the full list of all tags, sorted by their associated category:
|
|
|
|
How it worksThe system uses the visual content of the videos only and learns statistical models for the connections between tags and features. Thereby, it integrates multiple features in so-called feature pipelines:
|
|
|
For a complete overview of the system and its reasoning as well as experimental results, please refer to our publication: A. Ulges, C. Schulze, D. Keysers, T. M. Breuel: „Content-based Video Tagging for Online Video Portals“, Third MUSCLE / ImageCLEF Workshop on Image and Video Retrieval Evaluation, 2007. (PDF) |
|
|
schulze@iupr.dfki.de
Last modified: Wed Aug 1 10:54:11 CEST 2007 |