Pittman Jennifer, Huang Erich, Dressman Holly, Horng Cheng-Fang, Cheng Skye H, Tsou Mei-Hua, Chen Chii-Ming, Bild Andrea, Iversen Edwin S, Huang Andrew T, Nevins Joseph R, West Mike
Institute of Statistics and Decision Sciences, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA.
Proc Natl Acad Sci U S A. 2004 Jun 1;101(22):8431-6. doi: 10.1073/pnas.0401736101. Epub 2004 May 19.
We describe a comprehensive modeling approach to combining genomic and clinical data for personalized prediction in disease outcome studies. This integrated clinicogenomic modeling framework is based on statistical classification tree models that evaluate the contributions of multiple forms of data, both clinical and genomic, to define interactions of multiple risk factors that associate with the clinical outcome and derive predictions customized to the individual patient level. Gene expression data from DNA microarrays is represented by multiple, summary measures that we term metagenes; each metagene characterizes the dominant common expression pattern within a cluster of genes. A case study of primary breast cancer recurrence demonstrates that models using multiple metagenes combined with traditional clinical risk factors improve prediction accuracy at the individual patient level, delivering predictions more accurate than those made by using a single genomic predictor or clinical data alone. The analysis also highlights issues of communicating uncertainty in prediction and identifies combinations of clinical and genomic risk factors playing predictive roles. Implicated metagenes identify gene subsets with the potential to aid biological interpretation. This framework will extend to incorporate any form of data, including emerging forms of genomic data, and provides a platform for development of models for personalized prognosis.
我们描述了一种全面的建模方法,用于在疾病预后研究中结合基因组和临床数据进行个性化预测。这种综合的临床基因组建模框架基于统计分类树模型,该模型评估多种形式的数据(包括临床数据和基因组数据)的贡献,以定义与临床结果相关的多种风险因素的相互作用,并得出针对个体患者水平定制的预测。来自DNA微阵列的基因表达数据由多种我们称为元基因的汇总指标表示;每个元基因表征一组基因内的主要共同表达模式。一项原发性乳腺癌复发的案例研究表明,使用多个元基因结合传统临床风险因素的模型在个体患者水平上提高了预测准确性,提供的预测比仅使用单一基因组预测器或临床数据所做的预测更准确。该分析还突出了预测中不确定性传达的问题,并确定了发挥预测作用的临床和基因组风险因素的组合。涉及的元基因识别出有可能辅助生物学解释的基因子集。这个框架将扩展到纳入任何形式的数据,包括新兴的基因组数据形式,并为个性化预后模型的开发提供一个平台。