J Proteome Res. 2019 Aug 2;18(8):3195-3202. doi: 10.1021/acs.jproteome.9b00268. Epub 2019 Jul 17.
Deep learning (DL), a type of machine learning approach, is a powerful tool for analyzing large sets of data that are derived from biomedical sciences. However, it remains unknown whether DL is suitable for identifying contributing factors, such as biomarkers, in quantitative proteomics data. In this study, we describe an optimized DL-based analytical approach using a data set that was generated by selected reaction monitoring-mass spectrometry (SRM-MS), comprising SRM-MS data from 1008 samples for the diagnosis of pancreatic cancer, to test its classification power. Its performance was compared with that of 5 conventional multivariate and machine learning methods: random forest (RF), support vector machine (SVM), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes (NB). The DL method yielded the best classification (AUC 0.9472 for the test data set) of all approaches. We also optimized the parameters of DL individually to determine which factors were the most significant. In summary, the DL method has advantages in classifying the quantitative proteomics data of pancreatic cancer patients, and our results suggest that its implementation can improve the performance of diagnostic assays in clinical settings.
深度学习(DL)是一种机器学习方法,可用于分析来自生物医学科学的大型数据集。然而,尚不清楚 DL 是否适合识别定量蛋白质组学数据中的贡献因素,如生物标志物。在这项研究中,我们描述了一种使用通过选择反应监测-质谱(SRM-MS)生成的数据集的优化 DL 分析方法,该数据集包含 1008 个用于诊断胰腺癌的样本的 SRM-MS 数据,以测试其分类能力。将其性能与 5 种常规的多元和机器学习方法进行了比较:随机森林(RF)、支持向量机(SVM)、逻辑回归(LR)、k-最近邻(k-NN)和朴素贝叶斯(NB)。DL 方法在所有方法中均实现了最佳的分类(测试数据集的 AUC 为 0.9472)。我们还单独优化了 DL 的参数,以确定哪些因素最重要。总之,DL 方法在分类胰腺癌患者的定量蛋白质组学数据方面具有优势,我们的结果表明,其应用可以提高临床环境中诊断检测的性能。