Kirk Paul, Witkover Aviva, Bangham Charles R M, Richardson Sylvia, Lewin Alexandra M, Stumpf Michael P H
1 Division of Molecular Biosciences, Imperial College London , London, United Kingdom .
J Comput Biol. 2013 Dec;20(12):979-89. doi: 10.1089/cmb.2013.0018. Epub 2013 Aug 2.
Recent studies have highlighted the importance of assessing the robustness of putative biomarkers identified from experimental data. This has given rise to the concept of stable biomarkers, which are ones that are consistently identified regardless of small perturbations to the data. Since stability is not by itself a useful objective, we present a number of strategies that combine assessments of stability and predictive performance in order to identify biomarkers that are both robust and diagnostically useful. Moreover, by wrapping these strategies around logistic regression classifiers regularized by the elastic net penalty, we are able to assess the effects of correlations between biomarkers upon their perceived stability. We use a synthetic example to illustrate the properties of our proposed strategies. In this example, we find that: (i) assessments of stability can help to reduce the number of false-positive biomarkers, although potentially at the cost of missing some true positives; (ii) combining assessments of stability with assessments of predictive performance can improve the true positive rate; and (iii) correlations between biomarkers can have adverse effects on their stability and hence must be carefully taken into account when undertaking biomarker discovery. We then apply our strategies in a proteomics context to identify a number of robust candidate biomarkers for the human disease HTLV1-associated myelopathy/tropical spastic paraparesis (HAM/TSP).
近期的研究强调了评估从实验数据中识别出的假定生物标志物稳健性的重要性。这催生了稳定生物标志物的概念,即那些无论数据有小的扰动都能被一致识别出来的生物标志物。由于稳定性本身并非一个有用的目标,我们提出了一些将稳定性评估与预测性能评估相结合的策略,以便识别出既稳健又具有诊断价值的生物标志物。此外,通过将这些策略应用于由弹性网络惩罚正则化的逻辑回归分类器,我们能够评估生物标志物之间的相关性对其稳定性的影响。我们用一个合成示例来说明我们提出的策略的特性。在这个示例中,我们发现:(i)稳定性评估有助于减少假阳性生物标志物的数量,尽管可能会以遗漏一些真阳性为代价;(ii)将稳定性评估与预测性能评估相结合可以提高真阳性率;(iii)生物标志物之间的相关性可能会对其稳定性产生不利影响,因此在进行生物标志物发现时必须仔细考虑。然后,我们将我们的策略应用于蛋白质组学背景下,以识别出一些用于人类疾病成人T细胞白血病病毒1型相关脊髓病/热带痉挛性截瘫(HAM/TSP)的稳健候选生物标志物。