通过共识建模和验证获得分子分类信心。

Gaining Confidence on Molecular Classification through Consensus Modeling and Validation.

机构信息

Center for Toxicoinformatics, National Center for Toxicological Research (NCTR), U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.

出版信息

Toxicol Mech Methods. 2006;16(2-3):59-68. doi: 10.1080/15376520600558259.

DOI:10.1080/15376520600558259

PMID:20020998

Abstract

Current advances in genomics, proteomics, and metabonomics would result in a constellation of benefits in human health. Classification applying supervised learning methods to omics data as one of the molecular classification approaches has enjoyed its growing role in clinical application. However, the utility of a molecular classifier will not be fully appreciated unless its quality is carefully validated. A clinical omics data is usually noisy with the number of independent variables far more than the number of subjects and, possibly, with a skewed subject distribution. Given that, the consensus approach holds an advantage over a single classifier. Thus, the focus of this review is mainly placed on how validating a molecular classifier using Decision Forest (DF), a robust consensus approach. We recommended that a molecular classifier has to be assessed with respect to overall prediction accuracy, prediction confidence and chance correlation, which can be readily achieved in DF. The commonalities and differences between external validation and cross-validation are also discussed for perspective use of these methods to validate a DF classifier. In addition, the advantages of using consensus approaches for identification of potential biomarkers are also rationalized. Although specific DF examples are used in this review, the provided rationales and recommendations should be equally applicable to other consensus methods.

摘要

目前，基因组学、蛋白质组学和代谢组学的进展将给人类健康带来一系列益处。将监督学习方法应用于组学数据的分类是分子分类方法之一，它在临床应用中发挥着越来越重要的作用。然而，除非仔细验证其质量，否则分子分类器的实用性将无法得到充分体现。临床组学数据通常存在噪声，自变量的数量远远超过样本数量，并且可能存在偏斜的样本分布。鉴于此，共识方法比单个分类器具有优势。因此，本综述的重点主要放在如何使用稳健的共识方法决策森林 (DF) 来验证分子分类器。我们建议使用总体预测准确性、预测置信度和机会相关性来评估分子分类器，这在 DF 中可以轻松实现。还讨论了外部验证和交叉验证之间的异同，以便从这些方法的角度考虑使用这些方法来验证 DF 分类器。此外，还合理化了使用共识方法识别潜在生物标志物的优势。尽管在本综述中使用了特定的 DF 示例，但提供的原理和建议应同样适用于其他共识方法。