基于免疫组化图像的不平衡多标签蛋白质亚细胞定位预测的多尺度深度学习。

Multi-scale deep learning for the imbalanced multi-label protein subcellular localization prediction based on immunohistochemistry images.

机构信息

School of Software, Shandong University, Jinan, China.

Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.

出版信息

Bioinformatics. 2022 Apr 28;38(9):2602-2611. doi: 10.1093/bioinformatics/btac123.

DOI:10.1093/bioinformatics/btac123

PMID:35212728

Abstract

MOTIVATION

The development of microscopic imaging techniques enables us to study protein subcellular locations from the tissue level down to the cell level, contributing to the rapid development of image-based protein subcellular location prediction approaches. However, existing methods suffer from intrinsic limitations, such as poor feature representation ability, data imbalanced issue, and multi-label classification problem, greatly impacting the model performance and generalization.

RESULTS

In this study, we propose MSTLoc, a novel multi-scale end-to-end deep learning model to identify protein subcellular locations in the imbalanced multi-label immunohistochemistry (IHC) images dataset. In our MSTLoc, we deploy a deep convolution neural network to extract multi-scale features from the IHC images, aggregate the high-level features and low-level features via feature fusion to sufficiently exploit the dependencies amongst various subcellular locations, and utilize Vision Transformer (ViT) to model the relationship amongst the features and enhance the feature representation ability. We demonstrate that the proposed MSTLoc achieves better performance than current state-of-the-art models in multi-label subcellular location prediction. Through feature visualization and interpretation analysis, we demonstrate that as compared with the hand-crafted features, the multi-scale deep features learnt from our model exhibit better ability in capturing discriminative patterns underlying protein subcellular locations, and the features from different scales are complementary for the improvement in performance. Finally, case study results indicate that our MSTLoc can successfully identify some biomarkers from proteins that are closely involved with cancer development.

AVAILABILITY AND IMPLEMENTATION

For the convenient use of our method, we establish a user-friendly webserver available at http://server.wei-group.net/MSTLoc.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

微观成像技术的发展使我们能够从组织水平到细胞水平研究蛋白质的亚细胞定位，这促进了基于图像的蛋白质亚细胞定位预测方法的快速发展。然而，现有的方法存在固有局限性，例如特征表示能力差、数据不平衡问题和多标签分类问题，这极大地影响了模型的性能和泛化能力。

结果

在这项研究中，我们提出了 MSTLoc，这是一种新颖的端到端深度学习模型，用于识别不平衡的多标签免疫组织化学（IHC）图像数据集中的蛋白质亚细胞位置。在我们的 MSTLoc 中，我们部署了一个深度卷积神经网络从 IHC 图像中提取多尺度特征，通过特征融合聚合高层特征和底层特征，以充分利用各种亚细胞位置之间的依赖关系，并利用 Vision Transformer（ViT）来建模特征之间的关系，增强特征表示能力。我们证明，所提出的 MSTLoc 在多标签亚细胞位置预测方面优于当前最先进的模型。通过特征可视化和解释性分析，我们证明与手工制作的特征相比，从我们的模型中学习到的多尺度深度特征在捕捉蛋白质亚细胞位置的潜在判别模式方面具有更好的能力，并且不同尺度的特征在提高性能方面是互补的。最后，案例研究结果表明，我们的 MSTLoc 可以成功识别与癌症发展密切相关的蛋白质中的一些生物标志物。