Pathak Jyotishman, Pan Helen, Wang Janey, Kashyap Sudha, Schad Peter A, Hamilton Carol M, Masys Daniel R, Chute Christopher G
Mayo Clinic, Rochester, MN;
AMIA Jt Summits Transl Sci Proc. 2011;2011:41-5. Epub 2011 Mar 7.
Combining genome-wide association studies (GWAS) data with clinical information from the electronic medical record (EMR) provide unprecedented opportunities to identify genetic variants that influence susceptibility to common, complex diseases. While mining the vastness of EMR greatly expands the potential for conducting GWAS, non-standardized representation and wide variability of clinical data and phenotypes pose a major challenge to data integration and analysis. To address this requirement, we present experiences and methods developed to map phenotypic data elements from eMERGE (Electronic Medical Record and Genomics) to PhenX (Consensus Measures for Phenotypes and Exposures) and NCI's Cancer Data Standards Registry and Repository (caDSR). Our results suggest that adopting multiple standards and biomedical terminologies will expose studies to a broader user community and enhance interoperability with a wider range of studies, in turn promoting cross-study pooling of data to detect both more subtle and more complex genotype-phenotype associations.
将全基因组关联研究(GWAS)数据与电子病历(EMR)中的临床信息相结合,为识别影响常见复杂疾病易感性的基因变异提供了前所未有的机会。虽然挖掘海量的电子病历极大地扩展了进行GWAS的潜力,但临床数据和表型的非标准化表示以及广泛的变异性对数据整合和分析构成了重大挑战。为满足这一需求,我们介绍了为将电子病历与基因组学(eMERGE)的表型数据元素映射到PhenX(表型和暴露的共识测量)以及美国国立癌症研究所的癌症数据标准注册库和储存库(caDSR)而开发的经验和方法。我们的结果表明,采用多种标准和生物医学术语将使研究接触到更广泛的用户群体,并增强与更广泛研究的互操作性,进而促进跨研究的数据汇总,以检测更细微和更复杂的基因型-表型关联。