Zhang Fan, Chen Jake, Wang Mu, Drabier Renee
BMC Proc. 2013 Dec 20;7(Suppl 7):S10. doi: 10.1186/1753-6561-7-S7-S10.
In the past several years, there has been increasing interest and enthusiasm in molecular biomarkers as tools for early detection of cancer. Liquid chromatography tandem mass spectrometry (LC/MS/MS) based plasma proteomics profiling technique is a promising technology platform to study candidate protein biomarkers for early detection of cancer. Factors such as inherent variability, protein detectability limitation, and peptide discovery biases among LC/MS/MS platforms have made the classification and prediction of proteomics profiles challenging. Developing proteomics data analysis methods to identify multi-protein biomarker panels for breast cancer diagnosis based on neural networks provides hope for improving both the sensitivity and the specificity of candidate cancer biomarkers for early detection.
In our previous method, we developed a Feed Forward Neural Network-based method to build the classifier for plasma samples of breast cancer and then applied the classifier to predict blind dataset of breast cancer. However, the optimal combination C* in our previous method was actually determined by applying the trained FFNN on the testing set with the combination. Therefore, in this paper, we applied a three way data split to the Feed Forward Neural Network for training, validation and testing based. We found that the prediction performance of the FFNN model based on the three way data split outperforms our previous method and the prediction performance is improved from (AUC = 0.8706, precision = 82.5%, accuracy = 82.5%, sensitivity = 82.5%, specificity = 82.5% for the testing set) to (AUC = 0.895, precision = 86.84%, accuracy = 85%, sensitivity = 82.5%, specificity = 87.5% for the testing set).
Further pathway analysis showed that the top three five-marker panels are associated with complement and coagulation cascades, signaling, activation, and hemostasis, which are consistent with previous findings. We believe the new approach is a better solution for multi-biomarker panel discovery and it can be applied to other clinical proteomics.
在过去几年中,作为癌症早期检测工具的分子生物标志物越来越受到关注和青睐。基于液相色谱串联质谱(LC/MS/MS)的血浆蛋白质组学分析技术是研究用于癌症早期检测的候选蛋白质生物标志物的一个很有前景的技术平台。诸如固有变异性、蛋白质可检测性限制以及LC/MS/MS平台之间的肽发现偏差等因素使得蛋白质组学图谱的分类和预测具有挑战性。开发基于神经网络的蛋白质组学数据分析方法以识别用于乳腺癌诊断的多蛋白生物标志物组合,为提高候选癌症生物标志物早期检测的灵敏度和特异性带来了希望。
在我们之前的方法中,我们开发了一种基于前馈神经网络的方法来构建乳腺癌血浆样本的分类器,然后将该分类器应用于预测乳腺癌的盲数据集。然而,我们之前方法中的最优组合C*实际上是通过将训练好的前馈神经网络应用于具有该组合的测试集来确定的。因此,在本文中,我们将三分法数据分割应用于前馈神经网络进行基于训练、验证和测试。我们发现基于三分法数据分割的前馈神经网络模型的预测性能优于我们之前的方法,并且预测性能从(测试集的AUC = 0.8706,精度 = 82.5%,准确率 = 82.5%,灵敏度 = 82.5%,特异性 = 82.5%)提高到(测试集的AUC = 0.895,精度 = 86.84%,准确率 = 85%,灵敏度 = 82.5%,特异性 = 87.5%)。
进一步的通路分析表明,排名前三的五标志物组合与补体和凝血级联、信号传导、激活和止血相关,这与之前的发现一致。我们相信新方法是多生物标志物组合发现的更好解决方案,并且它可以应用于其他临床蛋白质组学。