Xue Min-Qi, Zhu Xi-Liang, Wang Ge, Xu Ying-Ying
School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.
Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China.
Bioinformatics. 2022 Jan 12;38(3):827-833. doi: 10.1093/bioinformatics/btab730.
Knowledge of subcellular locations of proteins is of great significance for understanding their functions. The multi-label proteins that simultaneously reside in or move between more than one subcellular structure usually involve with complex cellular processes. Currently, the subcellular location annotations of proteins in most studies and databases are descriptive terms, which fail to capture the protein amount or fractions across different locations. This highly limits the understanding of complex spatial distribution and functional mechanism of multi-label proteins. Thus, quantitatively analyzing the multiplex location patterns of proteins is an urgent and challenging task.
In this study, we developed a deep-learning-based pattern unmixing pipeline for protein subcellular localization (DULoc) to quantitatively estimate the fractions of proteins localizing in different subcellular compartments from immunofluorescence images. This model used a deep convolutional neural network to construct feature representations, and combined multiple nonlinear decomposing algorithms as the pattern unmixing method. Our experimental results showed that the DULoc can achieve over 0.93 correlation between estimated and true fractions on both real and synthetic datasets. In addition, we applied the DULoc method on the images in the human protein atlas database on a large scale, and showed that 70.52% of proteins can achieve consistent location orders with the database annotations.
The datasets and code are available at: https://github.com/PRBioimages/DULoc.
Supplementary data are available at Bioinformatics online.
了解蛋白质的亚细胞定位对于理解其功能具有重要意义。同时存在于一个以上亚细胞结构中或在其间移动的多标签蛋白质通常涉及复杂的细胞过程。目前,大多数研究和数据库中蛋白质的亚细胞定位注释都是描述性术语,无法获取不同位置的蛋白质量或比例。这极大地限制了对多标签蛋白质复杂空间分布和功能机制的理解。因此,定量分析蛋白质的多重定位模式是一项紧迫且具有挑战性的任务。
在本研究中,我们开发了一种基于深度学习的蛋白质亚细胞定位模式分解流程(DULoc),用于从免疫荧光图像中定量估计定位于不同亚细胞区室的蛋白质比例。该模型使用深度卷积神经网络构建特征表示,并结合多种非线性分解算法作为模式分解方法。我们的实验结果表明,DULoc在真实数据集和合成数据集上,估计比例与真实比例之间的相关性均可超过0.93。此外,我们大规模地将DULoc方法应用于人类蛋白质图谱数据库中的图像,结果表明70.52%的蛋白质能够实现与数据库注释一致的定位顺序。
数据集和代码可在以下网址获取:https://github.com/PRBioimages/DULoc。
补充数据可在《生物信息学》在线获取。