Wheeler Heather E, Aquino-Michaels Keston, Gamazon Eric R, Trubetskoy Vassily V, Dolan M Eileen, Huang R Stephanie, Cox Nancy J, Im Hae Kyung
Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America.
Genet Epidemiol. 2014 Jul;38(5):402-15. doi: 10.1002/gepi.21808. Epub 2014 May 2.
High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic, or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome, and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. Using clinical statin response, we show improved prediction over existing methods. We provide an R package to implement OmicKriging (http://www.scandb.org/newinterface/tools/OmicKriging.html).
对疾病风险或药物反应等复杂性状进行高置信度预测是个性化医疗的终极目标。尽管全基因组关联研究已经发现了数千个与广泛复杂性状相关的、得到充分重复验证的多态性,但这些关联对任何给定性状的综合预测能力通常过低,以至于缺乏临床相关性。我们提出了一种用于复杂性状预测的新型系统方法,该方法利用并整合了遗传、转录组或其他组学水平数据中的相似性。我们使用一种名为克里金法的方法将组学相似性转化为表型相似性,克里金法常用于地质统计学和机器学习。我们的方法名为组学克里金法,强调使用各种系统水平的数据,例如通过对基因组、转录组和表观基因组进行全面调查而日益可得的数据,来进行复杂性状预测。此外,我们的组学克里金法框架允许轻松整合来自异质来源的关于组学水平数据子集功能的先验信息,而无需承担贝叶斯方法有时繁重的计算负担。使用来自威康信托病例对照协会(WTCCC)的七个疾病数据集,我们表明组学克里金法允许简单整合稀疏且高度多基因的成分,在计算时间仅为最近发表的贝叶斯稀疏线性混合模型方法一小部分的情况下,产生可比的性能。使用细胞生长表型,我们表明整合mRNA和microRNA表达数据比单独使用任何一个数据集能显著提高性能。使用临床他汀类药物反应,我们表明预测性能优于现有方法。我们提供了一个R包来实现组学克里金法(http://www.scandb.org/newinterface/tools/OmicKriging.html)。