Department of Mathematics and Bioinformatics, Eastern Michigan University, Ypsilanti, MI 48109, USA.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-11-S1-S1.
As a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnostic paradigm is that robustly identifying cancer molecular patterns from high-dimensional protein expression data is still a challenge in machine learning and oncology research. As a well-established dimension reduction technique, PCA is widely integrated in pattern recognition analysis to discover cancer molecular patterns. However, its global feature selection mechanism prevents it from capturing local features. This may lead to difficulty in achieving high-performance proteomic pattern discovery, because only features interpreting global data behavior are used to train a learning machine.
In this study, we develop a nonnegative principal component analysis algorithm and present a nonnegative principal component analysis based support vector machine algorithm with sparse coding to conduct a high-performance proteomic pattern classification. Moreover, we also propose a nonnegative principal component analysis based filter-wrapper biomarker capturing algorithm for mass spectral serum profiles.
We demonstrate the superiority of the proposed algorithm by comparison with six peer algorithms on four benchmark datasets. Moreover, we illustrate that nonnegative principal component analysis can be effectively used to capture meaningful biomarkers.
Our analysis suggests that nonnegative principal component analysis effectively conduct local feature selection for mass spectral profiles and contribute to improving sensitivities and specificities in the following classification, and meaningful biomarker discovery.
作为一种新型的癌症诊断范例,质谱血清蛋白质组模式诊断被报道优于传统的血清癌症生物标志物。然而,其临床应用尚未得到充分验证。防止这项年轻技术成为主流癌症诊断范例的一个重要因素是,从高维蛋白质表达数据中稳健地识别癌症分子模式仍然是机器学习和肿瘤学研究中的一个挑战。作为一种成熟的降维技术,PCA 广泛集成在模式识别分析中,以发现癌症分子模式。然而,其全局特征选择机制阻止了它捕获局部特征。这可能导致难以实现高性能蛋白质组模式发现,因为仅使用解释全局数据行为的特征来训练学习机。
在本研究中,我们开发了一种非负主成分分析算法,并提出了一种基于非负主成分分析的支持向量机稀疏编码算法,用于进行高性能蛋白质组模式分类。此外,我们还提出了一种基于非负主成分分析的过滤包装生物标志物捕获算法,用于质谱血清谱。
我们通过与六个同行算法在四个基准数据集上的比较,证明了所提出算法的优越性。此外,我们还表明,非负主成分分析可以有效地用于捕获有意义的生物标志物。
我们的分析表明,非负主成分分析可以有效地对质谱图谱进行局部特征选择,有助于提高后续分类的灵敏度和特异性,并有助于有意义的生物标志物发现。