mm2016vsd

The mm2016vsd dataset is developed as a consequence of our research on video violence detection. It was derived from the MediaEval 2015 Violent Scenes Detection dataset, which consists of two disjoint subsets: a development set of 6,144 video clips (called vsd2015dev in mm2016vsd) and a test set of 4,756 video clips (called vsd2015test).

Subclass Annotations

We enrich the MediaEval dataset by manually labeling violent videos with respect to subclasses visually related to violence:

Concept occurrence
# positive videos
concept dev set devtrain set devval set test set
violence 272 190 82 230
blood 80 50 30 58
gun 57 41 16 39
force 43 28 15 27
death 43 25 18 16
weapon 38 31 7 8
rope 30 21 9 51
fight 29 21 8 45
hit 26 18 8 39
bind 25 17 8 69
aim 22 15 7 20

Feature table

Fourteen features used in our experiments, describing video content in varied aspects. Depending on the feature in use, an instance is a video for video-level features, a frame for frame-level features, and a video segment for audio features. Hence, the number of instances per dataset varies over features.
Number of instances
Modality Feature Dimension dev set devtrain set devval set test set
Image frame-level vggnet (2.2 GB) 4,096 131,441 91,930 39,511 101,587
video-level vggnet (189 MB) 4,096 6,144 4,300 1,844 4,756
frame-level googlenet (1.1 GB) 1,024 131,441 91,930 39,511 101,587
video-level googlenet (60 MB) 1,024 6,144 4,300 1,844 4,756
frame-level googlenet4k (959 MB) 1,024 131,441 91,930 39,511 101,587
video-level googlenet4k (59 MB) 1,024 6,144 4,300 1,844 4,756
Audio mfcc + bow (203 MB) 4,096 50,543 35,365 15,178 39,415
mfcc + fisher vector (7.2 GB) 19,968 50,543 35,365 15,178 39,415
Motion mbh + bow (181 MB) 4,000 6,143 4,300 1,843 4,755
mbh + fisher vector (5.3 GB) 98,304 6,143 4,300 1,843 4,755
hog + bow (210 MB) 4,000 6,143 4,300 1,843 4,755
hog + fisher vector (2.8 GB) 49,152 6,143 4,300 1,843 4,755
hof + bow (206 MB) 4,000 6,143 4,300 1,843 4,755
hof + fisher vector (3.1 GB) 55,296 6,143 4,300 1,843 4,755

Download all the 14 features in two lines (it will take quite a while):

wget http://lixirong.net/data/mm2016vsd/feature_urls.txt
wget -i feature_urls.txt

Code

work in progress

Reference

Xirong Li, Yujia Huo, Qin Jin, Jieping Xu (2016): Detecting Violence in Video using Subclasses. In: ACM Multimedia, 2016.