The mm2015cmrf dataset is developed as part of our research on cross-media retrieval. It was derived from the Clickture-Lite dataset, which consists of two disjoint subsets: a training set of 1 million images (called msr2013train) and a dev set of 79,665 images (called msr2013dev).
- Dataset without visual features (1.7 GB): Text data of msr2013train and msr203dev, and a 200-dim word2vec model pre-trained on flick tags (link)
- Required visual features (5.0 GB): fc7 and prob feature of CaffeNet (link)
- Optional visual features (7.9 GB): fc7 and prob feature of PlaceCNN (link)
Code
https://github.com/danieljf24/cmrf
Reference
Image Retrieval by Cross-Media Relevance Fusion. In: ACM Multimedia, pp. 173–176, 2015.