xirong

Two papers accepted at ICPR2020

We have two papers accepted at ICPR2020. At its 25th edition, the International Conference on Pattern Recognition (ICPR) is the premier world conference in Pattern Recognition.

Wei et al., Learn to Segment Retinal Lesions and Beyond, ICPR 2020
Li et al., Deep Multiple Instance Learning with Spatial Attention for ROP Case Classification, Instance Selection and Abnormality Localization, ICPR 2020

AE for ACM TOMM

I am delighted to serve as an Associate Editor of the ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM). TOMM focuses on multimedia computing (I/O devices, OS, storage systems, streaming media middleware, continuous media representations, media coding, media processing, etc.), multimedia communications (real-time protocols, end-to-end streaming media, resource allocation, multicast protocols, etc.), and multimedia applications (databases, distributed collaboration, video conferencing, 3D virtual environments, etc.).

ICMR2020: iCap: Interactive Image Captioning with Predictive Text

Our ICMR’20 paper on interactive image captioning is online.

In this paper we study a brand new topic of interactive image captioning with human in the loop. Different from automated image captioning where a given test image is the sole input in the inference stage, we have access to both the test image and a sequence of (incomplete) user-input sentences in the interactive scenario. We formulate the problem as Visually Conditioned Sentence Completion (VCSC). For VCSC, we propose ABD-Cap, asynchronous bidirectional decoding for image caption completion. With ABD-Cap as the core module, we build iCap, a web-based interactive image captioning system capable of predicting new text with respect to live input from a user. A number of experiments covering both automated evaluations and real user studies show the viability of our proposals.

Zhengxiong Jia, Xirong Li: iCap: Interactive Image Captioning with Predictive Text. In: ACM International Conference on Multimedia Retrieval (ICMR), 2020.

AE for the Multimedia Systems journal

I am delighted to serve as an Associate Editor of the Multimedia Systems journal. This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications.

MICCAI2019: Fully Deep Learning for Slit-lamp Photo based Nuclear Cataract Grading

Our MICCAI2019 paper on automated nuclear cataract grading is online.

Age-related cataract is a priority eye disease, with nuclear cataract as its most common type. This paper aims for automated nuclear cataract grading based on slit-lamp photos. Different from previous efforts which rely on traditional feature extraction and grade modeling techniques, we propose in this paper a fully deep learning based solution. Given a slit-lamp photo, we localize its nuclear region by Faster R-CNN, followed by a ResNet-101 based grading model. In order to alleviate the issue of imbalanced data, a simple batch balancing strategy is introduced for improving the training of the grading network. Tested on a clinical dataset of 157 slit-lamp photos from 39 female and 31 male patients, the proposed solution outperforms the state-of-the-art, reducing the mean absolute error from 0.357 to 0.313. In addition, our solution processes a slit-lamp photo in approximately 0.1 second, which is two order faster than the state-of-the-art. With its effectiveness and efficiency, the new solution is promising for automated nuclear cataract grading.

miccai2019-nuclear-cataract-grading

Chaoxi Xu, Xiangjia Zhu, Wenwen He, Yi Lu, Xixi He, Zongjiang Shang, Jun Wu, Keke Zhang, Yinglei Zhang, Xianfang Rong, Zhennan Zhao, Lei Cai, Dayong Ding, Xirong Li: Fully Deep Learning for Slit-lamp Photo based Nuclear Cataract Grading. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2019, (early accept).

MMM2019: Four Models for Automatic Recognition of Left and Right Eye in Fundus Images

Our MMM2019 paper on recognizing Left / Right Eye in Fundus Images is online.

Fundus image analysis is crucial for eye condition screening and diagnosis and consequently personalized health management in a long term. This paper targets at left and right eye recognition, a basic module for fundus image analysis. We study how to automatically assign left-eye/right-eye labels to fundus images of posterior pole. For this under-explored task, four models are developed. Two of them are based on optic disc localization, using extremely simple max intensity and more advanced Faster R-CNN, respectively. The other two models require no localization, but perform holistic image classification using classical Local Binary Patterns (LBP) features and fine-tuned ResNet18, respectively. The four models are tested on a real-world set of 1,633 fundus images from 834 subjects. Fine-tuned ResNet-18 has the highest accuracy of 0.9847. Interestingly, the LBP based model, with the trick of left-right contrastive classification, performs closely to the deep model, with an accuracy of 0.9718.

Xin Lai, Xirong Li, Rui Qian, Dayong Ding, Jun Wu, Jieping Xu: Four Models for Automatic Recognition of Left and Right Eye in Fundus Images. the 25th International Conference on MultiMedia Modeling (MMM), 2019.