mm2016vsd

The mm2016vsd dataset is developed as a consequence of our research on video violence detection. It was derived from the MediaEval 2015 Violent Scenes Detection dataset, which consists of two disjoint subsets: a development set of 6,144 video clips (called vsd2015dev in mm2016vsd) and a test set of 4,756 video clips (called vsd2015test).

Subclass Annotations

We enrich the MediaEval dataset by manually labeling violent videos with respect to subclasses visually related to violence:

Subclass annotations of vsd2015dev
Subclass annotations of vsd2015test
Full annotations (vsd2015dev, vsd2015devtrain, vsd2015devval, vsd2015test)

Concept occurrence
	# positive videos
concept	dev set	devtrain set	devval set	test set
violence	272	190	82	230
blood	80	50	30	58
gun	57	41	16	39
force	43	28	15	27
death	43	25	18	16
weapon	38	31	7	8
rope	30	21	9	51
fight	29	21	8	45
hit	26	18	8	39
bind	25	17	8	69
aim	22	15	7	20

Feature table

Fourteen features used in our experiments, describing video content in varied aspects. Depending on the feature in use, an instance is a video for video-level features, a frame for frame-level features, and a video segment for audio features. Hence, the number of instances per dataset varies over features.
			Number of instances
Modality	Feature	Dimension	dev set	devtrain set	devval set	test set
Image	frame-level vggnet (2.2 GB)	4,096	131,441	91,930	39,511	101,587
	video-level vggnet (189 MB)	4,096	6,144	4,300	1,844	4,756
	frame-level googlenet (1.1 GB)	1,024	131,441	91,930	39,511	101,587
	video-level googlenet (60 MB)	1,024	6,144	4,300	1,844	4,756
	frame-level googlenet4k (959 MB)	1,024	131,441	91,930	39,511	101,587
	video-level googlenet4k (59 MB)	1,024	6,144	4,300	1,844	4,756
Audio	mfcc + bow (203 MB)	4,096	50,543	35,365	15,178	39,415
Audio	mfcc + fisher vector (7.2 GB)	19,968	50,543	35,365	15,178	39,415
Motion	mbh + bow (181 MB)	4,000	6,143	4,300	1,843	4,755
	mbh + fisher vector (5.3 GB)	98,304	6,143	4,300	1,843	4,755
	hog + bow (210 MB)	4,000	6,143	4,300	1,843	4,755
	hog + fisher vector (2.8 GB)	49,152	6,143	4,300	1,843	4,755
	hof + bow (206 MB)	4,000	6,143	4,300	1,843	4,755
	hof + fisher vector (3.1 GB)	55,296	6,143	4,300	1,843	4,755

Download all the 14 features in two lines (it will take quite a while):

wget http://lixirong.net/data/mm2016vsd/feature_urls.txt
wget -i feature_urls.txt

Code

work in progress

Reference

Xirong Li, Yujia Huo, Qin Jin, Jieping Xu: Detecting Violence in Video using Subclasses. In: ACM Multimedia, 2016.