开启临床实验室检测大规模用于关联研究的大门：探索定义表型的不同方法。

OPENING THE DOOR TO THE LARGE SCALE USE OF CLINICAL LAB MEASURES FOR ASSOCIATION TESTING: EXPLORING DIFFERENT METHODS FOR DEFINING PHENOTYPES.

作者信息

Bauer Christopher R, Lavage Daniel, Snyder John, Leader Joseph, Mahoney J Matthew, Pendergrass Sarah A

机构信息

Biomedical & Translational Informatics, Geisinger Health System, 100 N. Academy Ave. Danville, PA 17821, USA,

出版信息

Pac Symp Biocomput. 2017;22:356-367. doi: 10.1142/9789813207813_0034.

DOI:10.1142/9789813207813_0034

PMID:27896989

Abstract

The past decade has seen exponential growth in the numbers of sequenced and genotyped individuals and a corresponding increase in our ability of collect and catalogue phenotypic data for use in the clinic. We now face the challenge of integrating these diverse data in new ways new that can provide useful diagnostics and precise medical interventions for individual patients. One of the first steps in this process is to accurately map the phenotypic consequences of the genetic variation in human populations. The most common approach for this is the genome wide association study (GWAS). While this technique is relatively simple to implement for a given phenotype, the choice of how to define a phenotype is critical. It is becoming increasingly common for each individual in a GWAS cohort to have a large profile of quantitative measures. The standard approach is to test for associations with one measure at a time; however, there are many justifiable ways to define a set of phenotypes, and the genetic associations that are revealed will vary based on these definitions. Some phenotypes may only show a significant genetic association signal when considered together, such as through principle components analysis (PCA). Combining correlated measures may increase the power to detect association by reducing the noise present in individual variables and reduce the multiple hypothesis testing burden. Here we show that PCA and k-means clustering are two complimentary methods for identifying novel genotype-phenotype relationships within a set of quantitative human traits derived from the Geisinger Health System electronic health record (EHR). Using a diverse set of approaches for defining phenotype may yield more insights into the genetic architecture of complex traits and the findings presented here highlight a clear need for further investigation into other methods for defining the most relevant phenotypes in a set of variables. As the data of EHR continue to grow, addressing these issues will become increasingly important in our efforts to use genomic data effectively in medicine.

摘要

在过去十年中，测序和基因分型个体的数量呈指数增长，我们收集和编目临床使用的表型数据的能力也相应提高。我们现在面临着以新的方式整合这些多样数据的挑战，以便能够为个体患者提供有用的诊断和精准的医疗干预。这一过程的首要步骤之一是准确描绘人类群体中基因变异的表型后果。对此最常见的方法是全基因组关联研究（GWAS）。虽然对于给定的表型，这种技术相对容易实施，但如何定义表型的选择至关重要。在GWAS队列中，每个个体拥有大量定量测量数据的情况越来越普遍。标准方法是一次针对一项测量进行关联测试；然而，有许多合理的方式来定义一组表型，并且根据这些定义所揭示的基因关联会有所不同。一些表型可能只有在综合考虑时才会显示出显著的基因关联信号，比如通过主成分分析（PCA）。合并相关测量可能会通过减少单个变量中的噪声来提高检测关联的能力，并减轻多重假设检验的负担。在这里，我们表明主成分分析和k均值聚类是两种互补的方法，用于在源自盖辛格健康系统电子健康记录（EHR）的一组人类定量性状中识别新的基因型 - 表型关系。使用多种不同的方法来定义表型可能会对复杂性状的遗传结构有更多的深入了解，并且这里呈现的研究结果凸显了对进一步研究其他方法以在一组变量中定义最相关表型的明确需求。随着电子健康记录数据的持续增长，在我们有效利用基因组数据进行医学研究的努力中，解决这些问题将变得越来越重要。