Edmund Mach Foundation, Research and Innovation Center, Via Edmund Mach 1, 38010 San Michele all'Adige (TN), Italy.
Anal Chim Acta. 2011 Oct 31;705(1-2):15-23. doi: 10.1016/j.aca.2011.01.039. Epub 2011 Feb 1.
Biomarker identification, i.e., finding those variables that indicate true differences between two or more populations, is an ever more important topic in the omics sciences. In most cases, the number of variables far exceeds the number of samples, making biomarker identification extremely difficult. We present a strategy based on the stability of putative biomarkers under perturbation of the data, and show that in several cases important gains can be achieved. The strategy is very general and can be applied with all common biomarker identification methods; it also has the advantage that it does not rely on error estimates from crossvalidation, that in this setting tend to be highly variable.
生物标志物的识别,即找到那些能够表明两个或多个群体之间真实差异的变量,在组学科学中是一个越来越重要的话题。在大多数情况下,变量的数量远远超过样本的数量,这使得生物标志物的识别变得极其困难。我们提出了一种基于数据扰动下假定生物标志物稳定性的策略,并证明在几种情况下可以获得重要的收益。该策略非常通用,可以与所有常见的生物标志物识别方法一起使用;它还有一个优点,即不依赖于交叉验证的误差估计,因为在这种情况下,误差估计往往变化很大。