deep learning – Xirong Li

TPAMI 2023: MVSS-Net: Multi-View Multi-Scale Supervised Networks for Image Manipulation Detection

Our paper on deep learning based image manipulation detection has been published online as a regular paper in the March 2023 Issue of the IEEE Transactions on Pattern Analysis and Machine Intelligence journal (impact factor: 24.314). Source code is available at https://github.com/dong03/MVSS-Net

As manipulating images by copy-move, splicing and/or inpainting may lead to misinterpretation of the visual content, detecting these sorts of manipulations is crucial for media forensics. Given the variety of possible attacks on the content, devising a generic method is nontrivial. Current deep learning based methods are promising when training and test data are well aligned, but perform poorly on independent tests. Moreover, due to the absence of authentic test images, their image-level detection specificity is in doubt. The key question is how to design and train a deep neural network capable of learning generalizable features sensitive to manipulations in novel data, whilst specific to prevent false alarms on the authentic. We propose multi-view feature learning to jointly exploit tampering boundary artifacts and the noise view of the input image. As both clues are meant to be semantic-agnostic, the learned features are thus generalizable. For effectively learning from authentic images, we train with multi-scale (pixel / edge / image) supervision. We term the new network MVSS-Net and its enhanced version MVSS-Net++. Experiments are conducted in both within-dataset and cross-dataset scenarios, showing that MVSS-Net++ performs the best, and exhibits better robustness against JPEG compression, Gaussian blur and screenshot based image re-capturing.

Chengbo Dong, Xinru Chen, Ruohan Hu, Juan Cao, Xirong Li: MVSS-Net: Multi-View Multi-Scale Supervised Networks for Image Manipulation Detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.

ICMR2020: iCap: Interactive Image Captioning with Predictive Text

Our ICMR’20 paper on interactive image captioning is online.

In this paper we study a brand new topic of interactive image captioning with human in the loop. Different from automated image captioning where a given test image is the sole input in the inference stage, we have access to both the test image and a sequence of (incomplete) user-input sentences in the interactive scenario. We formulate the problem as Visually Conditioned Sentence Completion (VCSC). For VCSC, we propose ABD-Cap, asynchronous bidirectional decoding for image caption completion. With ABD-Cap as the core module, we build iCap, a web-based interactive image captioning system capable of predicting new text with respect to live input from a user. A number of experiments covering both automated evaluations and real user studies show the viability of our proposals.

Zhengxiong Jia, Xirong Li: iCap: Interactive Image Captioning with Predictive Text. In: ACM International Conference on Multimedia Retrieval (ICMR), 2020.

MICCAI2019: Fully Deep Learning for Slit-lamp Photo based Nuclear Cataract Grading

Our MICCAI2019 paper on automated nuclear cataract grading is online.

Age-related cataract is a priority eye disease, with nuclear cataract as its most common type. This paper aims for automated nuclear cataract grading based on slit-lamp photos. Different from previous efforts which rely on traditional feature extraction and grade modeling techniques, we propose in this paper a fully deep learning based solution. Given a slit-lamp photo, we localize its nuclear region by Faster R-CNN, followed by a ResNet-101 based grading model. In order to alleviate the issue of imbalanced data, a simple batch balancing strategy is introduced for improving the training of the grading network. Tested on a clinical dataset of 157 slit-lamp photos from 39 female and 31 male patients, the proposed solution outperforms the state-of-the-art, reducing the mean absolute error from 0.357 to 0.313. In addition, our solution processes a slit-lamp photo in approximately 0.1 second, which is two order faster than the state-of-the-art. With its effectiveness and efficiency, the new solution is promising for automated nuclear cataract grading.

miccai2019-nuclear-cataract-grading

Chaoxi Xu, Xiangjia Zhu, Wenwen He, Yi Lu, Xixi He, Zongjiang Shang, Jun Wu, Keke Zhang, Yinglei Zhang, Xianfang Rong, Zhennan Zhao, Lei Cai, Dayong Ding, Xirong Li: Fully Deep Learning for Slit-lamp Photo based Nuclear Cataract Grading. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2019, (early accept).

MMM2019: Four Models for Automatic Recognition of Left and Right Eye in Fundus Images

Our MMM2019 paper on recognizing Left / Right Eye in Fundus Images is online.

Fundus image analysis is crucial for eye condition screening and diagnosis and consequently personalized health management in a long term. This paper targets at left and right eye recognition, a basic module for fundus image analysis. We study how to automatically assign left-eye/right-eye labels to fundus images of posterior pole. For this under-explored task, four models are developed. Two of them are based on optic disc localization, using extremely simple max intensity and more advanced Faster R-CNN, respectively. The other two models require no localization, but perform holistic image classification using classical Local Binary Patterns (LBP) features and fine-tuned ResNet18, respectively. The four models are tested on a real-world set of 1,633 fundus images from 834 subjects. Fine-tuned ResNet-18 has the highest accuracy of 0.9847. Interestingly, the LBP based model, with the trick of left-right contrastive classification, performs closely to the deep model, with an accuracy of 0.9718.

Xin Lai, Xirong Li, Rui Qian, Dayong Ding, Jun Wu, Jieping Xu: Four Models for Automatic Recognition of Left and Right Eye in Fundus Images. the 25th International Conference on MultiMedia Modeling (MMM), 2019.

ACCV2018: Laser Scar Detection in Fundus Images using Convolutional Neural Networks

We are going to present our work on detecting laser scars in color fundus images at the 14th Asian Conference on Computer Vision (ACCV 2018) at Perth, Australia. This is a joint work with Vistel Inc. and Peking Union Medical College Hospital.

In diabetic eye screening programme, a special pathway is designed for those who have received laser photocoagulation treatment. The treatment leaves behind circular or irregular scars in the retina. Laser scar detection in fundus images is thus important for automated DR screening. Despite its importance, the problem is understudied in terms of both datasets and methods. This paper makes the first attempt to detect laser-scar images by deep learning. To that end, we contribute to the community Fundus10K, a large-scale expert-labeled dataset for training and evaluating laser scar detectors. We study in this new context major design choices of state-of-the-art Convolutional Neural Networks including Inception-v3, ResNet and DenseNet. For more effective training we exploit transfer learning that passes on trained weights of ImageNet models to their laser-scar countcerparts. Experiments on the new dataset shows that our best model detects laser-scar images with sensitivity of 0.962, specificity of 0.999, precision of 0.974 and AP of 0.988 and AUC of 0.999. The same model is tested on the public LMD-BAPT test set, obtaining sensitivity of 0.765, specificity of 1, precision of 1, AP of 0.975 and AUC of 0.991, outperforming the state-of-the-art with a large margin. Data is available at https://github.com/li-xirong/fundus10k/

Qijie Wei, Xirong Li, Hao Wang, Dayong Ding, Weihong Yu, Youxin Chen: Laser Scar Detection in Fundus Images using Convolutional Neural Networks. Asian Conference on Computer Vision (ACCV), 2018.