Lilien Ryan H, Farid Hany, Donald Bruce R
Dartmouth Computer Science Department, Hanover, NH 03755, USA.
J Comput Biol. 2003;10(6):925-46. doi: 10.1089/106652703322756159.
We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.
我们开发了一种名为Q5的算法,用于使用质谱对健康与疾病的全血清样本进行概率分类。该算法对全谱表面增强激光解吸/电离飞行时间(SELDI-TOF)质谱(MS)数据采用主成分分析(PCA),随后进行线性判别分析(LDA),并在来自人血清完整、复杂SELDI光谱的四个真实数据集上进行了验证。Q5是复杂蛋白质混合物完整质谱分类问题的闭式精确解。Q5采用基于降维线性判别分析构建的概率分类算法。我们的解决方案计算效率高;它是非迭代的,并使用闭式方程计算最优线性判别。针对人血清完整、复杂SELDI光谱的数据集计算并验证了最优判别。对每个数据集的不同训练/测试分割进行重复实验,以验证算法的稳健性。概率分类方法取得了优异的性能。在三个卵巢癌数据集和一个前列腺癌数据集上,我们实现了高于97%的灵敏度(, )特异性和阳性预测值。Q5方法优于以前的全谱复杂样本光谱分类技术,并可以为差异表达的蛋白质和肽的分子身份提供线索。