Dong Hao, Liu Yi, Zeng Wen-Feng, Shu Kunxian, Zhu Yunping, Chang Cheng
State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
Proteomics. 2020 Nov;20(21-22):e1900344. doi: 10.1002/pmic.201900344. Epub 2020 Jul 26.
Since the launch of Chinese Human Proteome Project (CNHPP) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), large-scale mass spectrometry (MS) based proteomic profiling of different kinds of human tumor samples have provided huge amount of valuable data for both basic and clinical researchers. Accurate prediction for tumor and non-tumor samples, as well as the tumor types has become a key step for biological and medical research, such as biomarker discovery, diagnosis, and monitoring of diseases. The traditional MS-based classification strategy mainly depends on the identification and quantification results of MS data, which has some inherent limitations, such as the low identification rate of MS data. Here, a deep learning-based tumor classifier directly using MS raw data is proposed, which is independent of the identification and quantification results of MS data. The potential precursors with intensities and retention times from MS data as input is first detected and extracted. Then, a deep learning-based classifier is trained, which can accurately distinguish between the tumor and non-tumor samples. Finally, it is demonstrated the deep learning-based classifier has a good performance compared with other machine learning methods and may help researchers find the potential biomarkers which are likely to be missed by the traditional strategy.
自中国人类蛋白质组计划(CNHPP)和临床蛋白质组肿瘤分析联盟(CPTAC)启动以来,基于大规模质谱(MS)对不同类型人类肿瘤样本进行蛋白质组分析,为基础和临床研究人员提供了大量有价值的数据。准确区分肿瘤样本和非肿瘤样本以及肿瘤类型,已成为生物和医学研究(如生物标志物发现、疾病诊断和监测)的关键步骤。传统的基于质谱的分类策略主要依赖于质谱数据的鉴定和定量结果,存在一些固有局限性,比如质谱数据的鉴定率较低。在此,提出了一种直接使用质谱原始数据的基于深度学习的肿瘤分类器,该分类器独立于质谱数据的鉴定和定量结果。首先检测并提取来自质谱数据的具有强度和保留时间的潜在前体。然后,训练一个基于深度学习的分类器,它能够准确区分肿瘤样本和非肿瘤样本。最后,证明了与其他机器学习方法相比,基于深度学习的分类器具有良好的性能,并且可能有助于研究人员发现传统策略可能遗漏的潜在生物标志物。