Zeng Guo-Hua, Zhu Xing-Zheng, Yang Hong-Rui, Liang Yong-Jia, Zhai Yu-Jia, Xu Ying-Ying
School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.
Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China.
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf331.
Pinpointing the subcellular location of proteins is essential for studying protein function and related diseases. Advances in spatial proteomics have shown that automatic recognition of protein subcellular localization from images could highly facilitate protein translocation analysis and biomarker discovery, but existing machine-learning works have been mostly limited to processing 2D images. By contrast, 3D images have higher spatial resolution and allow researchers to observe cellular structures in their natural context, but currently, there are only a few studies of 3D image processing for protein distribution analysis due to the lack of data and complexity of modeling.
We developed a knowledge-enhanced protein subcellular localization model, KE3DLoc, which could recognize distribution patterns in 3D fluorescence microscope images using deep learning methods. The model designs an image feature extraction module that incorporates information from 3D and 2D projected cells and implements asymmetric loss and confidence weights to address data imbalance and weak cell annotation issues. Besides, considering that the biological knowledge in the Gene Ontology (GO) database can provide valuable support for protein location understanding, the KE3DLoc model incorporates a novel knowledge enhancement module that optimizes the protein representation by related knowledge graphs derived from the GO. Since the image module and the knowledge module calculate features from different levels, KE3DLoc designs protein ID aggregation to enhance the consistency of protein features across different cells. Experimental results on three public datasets have demonstrated that the KE3DLoc significantly outperforms existing methods and provides valuable insights for spatial proteomics research.
All datasets and codes used in this study are available at GitHub: https://github.com/PRBioimages/KE3DLoc.
确定蛋白质的亚细胞定位对于研究蛋白质功能和相关疾病至关重要。空间蛋白质组学的进展表明,从图像中自动识别蛋白质亚细胞定位能够极大地促进蛋白质易位分析和生物标志物发现,但现有的机器学习工作大多局限于处理二维图像。相比之下,三维图像具有更高的空间分辨率,使研究人员能够在自然环境中观察细胞结构,但目前,由于缺乏数据和建模的复杂性,针对蛋白质分布分析的三维图像处理研究较少。
我们开发了一种知识增强型蛋白质亚细胞定位模型KE3DLoc,它可以使用深度学习方法识别三维荧光显微镜图像中的分布模式。该模型设计了一个图像特征提取模块,该模块整合了来自三维和二维投影细胞的信息,并实现了不对称损失和置信权重,以解决数据不平衡和细胞注释薄弱的问题。此外,考虑到基因本体(GO)数据库中的生物学知识可以为蛋白质定位理解提供有价值的支持,KE3DLoc模型纳入了一个新颖的知识增强模块,该模块通过从GO衍生的相关知识图谱优化蛋白质表示。由于图像模块和知识模块从不同层面计算特征,KE3DLoc设计了蛋白质ID聚合,以增强不同细胞间蛋白质特征的一致性。在三个公共数据集上的实验结果表明,KE3DLoc显著优于现有方法,并为空间蛋白质组学研究提供了有价值的见解。
本研究中使用的所有数据集和代码可在GitHub上获取:https://github.com/PRBioimages/KE3DLoc。