School of Software, Shandong University, Jinan, China.
Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
Bioinformatics. 2022 Apr 28;38(9):2602-2611. doi: 10.1093/bioinformatics/btac123.
The development of microscopic imaging techniques enables us to study protein subcellular locations from the tissue level down to the cell level, contributing to the rapid development of image-based protein subcellular location prediction approaches. However, existing methods suffer from intrinsic limitations, such as poor feature representation ability, data imbalanced issue, and multi-label classification problem, greatly impacting the model performance and generalization.
In this study, we propose MSTLoc, a novel multi-scale end-to-end deep learning model to identify protein subcellular locations in the imbalanced multi-label immunohistochemistry (IHC) images dataset. In our MSTLoc, we deploy a deep convolution neural network to extract multi-scale features from the IHC images, aggregate the high-level features and low-level features via feature fusion to sufficiently exploit the dependencies amongst various subcellular locations, and utilize Vision Transformer (ViT) to model the relationship amongst the features and enhance the feature representation ability. We demonstrate that the proposed MSTLoc achieves better performance than current state-of-the-art models in multi-label subcellular location prediction. Through feature visualization and interpretation analysis, we demonstrate that as compared with the hand-crafted features, the multi-scale deep features learnt from our model exhibit better ability in capturing discriminative patterns underlying protein subcellular locations, and the features from different scales are complementary for the improvement in performance. Finally, case study results indicate that our MSTLoc can successfully identify some biomarkers from proteins that are closely involved with cancer development.
For the convenient use of our method, we establish a user-friendly webserver available at http://server.wei-group.net/MSTLoc.
Supplementary data are available at Bioinformatics online.
微观成像技术的发展使我们能够从组织水平到细胞水平研究蛋白质的亚细胞定位,这促进了基于图像的蛋白质亚细胞定位预测方法的快速发展。然而,现有的方法存在固有局限性,例如特征表示能力差、数据不平衡问题和多标签分类问题,这极大地影响了模型的性能和泛化能力。
在这项研究中,我们提出了 MSTLoc,这是一种新颖的端到端深度学习模型,用于识别不平衡的多标签免疫组织化学(IHC)图像数据集中的蛋白质亚细胞位置。在我们的 MSTLoc 中,我们部署了一个深度卷积神经网络从 IHC 图像中提取多尺度特征,通过特征融合聚合高层特征和底层特征,以充分利用各种亚细胞位置之间的依赖关系,并利用 Vision Transformer(ViT)来建模特征之间的关系,增强特征表示能力。我们证明,所提出的 MSTLoc 在多标签亚细胞位置预测方面优于当前最先进的模型。通过特征可视化和解释性分析,我们证明与手工制作的特征相比,从我们的模型中学习到的多尺度深度特征在捕捉蛋白质亚细胞位置的潜在判别模式方面具有更好的能力,并且不同尺度的特征在提高性能方面是互补的。最后,案例研究结果表明,我们的 MSTLoc 可以成功识别与癌症发展密切相关的蛋白质中的一些生物标志物。
为了方便使用我们的方法,我们建立了一个用户友好的网络服务器,可在 http://server.wei-group.net/MSTLoc 上访问。
补充数据可在 Bioinformatics 在线获得。