The mm2016vsd dataset is developed as a consequence of our research on video violence detection. It was derived from the MediaEval 2015 Violent Scenes Detection dataset, which consists of two disjoint subsets: a development set of 6,144 video clips (called vsd2015dev in mm2016vsd) and a test set of 4,756 video clips (called vsd2015test).
Subclass Annotations
We enrich the MediaEval dataset by manually labeling violent videos with respect to subclasses visually related to violence:
- Subclass annotations of vsd2015dev
- Subclass annotations of vsd2015test
- Full annotations (vsd2015dev, vsd2015devtrain, vsd2015devval, vsd2015test)
# positive videos | ||||
---|---|---|---|---|
concept | dev set | devtrain set | devval set | test set |
violence | 272 | 190 | 82 | 230 |
blood | 80 | 50 | 30 | 58 |
gun | 57 | 41 | 16 | 39 |
force | 43 | 28 | 15 | 27 |
death | 43 | 25 | 18 | 16 |
weapon | 38 | 31 | 7 | 8 |
rope | 30 | 21 | 9 | 51 |
fight | 29 | 21 | 8 | 45 |
hit | 26 | 18 | 8 | 39 |
bind | 25 | 17 | 8 | 69 |
aim | 22 | 15 | 7 | 20 |
Feature table
Number of instances | ||||||
---|---|---|---|---|---|---|
Modality | Feature | Dimension | dev set | devtrain set | devval set | test set |
Image | frame-level vggnet (2.2 GB) | 4,096 | 131,441 | 91,930 | 39,511 | 101,587 |
video-level vggnet (189 MB) | 4,096 | 6,144 | 4,300 | 1,844 | 4,756 | |
frame-level googlenet (1.1 GB) | 1,024 | 131,441 | 91,930 | 39,511 | 101,587 | |
video-level googlenet (60 MB) | 1,024 | 6,144 | 4,300 | 1,844 | 4,756 | |
frame-level googlenet4k (959 MB) | 1,024 | 131,441 | 91,930 | 39,511 | 101,587 | |
video-level googlenet4k (59 MB) | 1,024 | 6,144 | 4,300 | 1,844 | 4,756 | |
Audio | mfcc + bow (203 MB) | 4,096 | 50,543 | 35,365 | 15,178 | 39,415 |
mfcc + fisher vector (7.2 GB) | 19,968 | 50,543 | 35,365 | 15,178 | 39,415 | |
Motion | mbh + bow (181 MB) | 4,000 | 6,143 | 4,300 | 1,843 | 4,755 |
mbh + fisher vector (5.3 GB) | 98,304 | 6,143 | 4,300 | 1,843 | 4,755 | |
hog + bow (210 MB) | 4,000 | 6,143 | 4,300 | 1,843 | 4,755 | |
hog + fisher vector (2.8 GB) | 49,152 | 6,143 | 4,300 | 1,843 | 4,755 | |
hof + bow (206 MB) | 4,000 | 6,143 | 4,300 | 1,843 | 4,755 | |
hof + fisher vector (3.1 GB) | 55,296 | 6,143 | 4,300 | 1,843 | 4,755 |
Download all the 14 features in two lines (it will take quite a while):
wget http://lixirong.net/data/mm2016vsd/feature_urls.txt wget -i feature_urls.txt
Code
work in progress
Reference
Detecting Violence in Video using Subclasses. In: ACM Multimedia, 2016.