Suppr超能文献

识别与代谢健康和血细胞计数实验室值变异性相关的基因关联:利用电子健康记录中的纵向数据深入研究数量性状。

IDENTIFYING GENETIC ASSOCIATIONS WITH VARIABILITY IN METABOLIC HEALTH AND BLOOD COUNT LABORATORY VALUES: DIVING INTO THE QUANTITATIVE TRAITS BY LEVERAGING LONGITUDINAL DATA FROM AN EHR.

作者信息

Verma Shefali S, Lucas Anastasia M, Lavage Daniel R, Leader Joseph B, Metpally Raghu, Krishnamurthy Sarathbabu, Dewey Frederick, Borecki Ingrid, Lopez Alexander, Overton John, Penn John, Reid Jeffrey, Pendergrass Sarah A, Breitwieser Gerda, Ritchie Marylyn D

机构信息

Department of Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA.

出版信息

Pac Symp Biocomput. 2017;22:533-544. doi: 10.1142/9789813207813_0049.

Abstract

A wide range of patient health data is recorded in Electronic Health Records (EHR). This data includes diagnosis, surgical procedures, clinical laboratory measurements, and medication information. Together this information reflects the patient's medical history. Many studies have efficiently used this data from the EHR to find associations that are clinically relevant, either by utilizing International Classification of Diseases, version 9 (ICD-9) codes or laboratory measurements, or by designing phenotype algorithms to extract case and control status with accuracy from the EHR. Here we developed a strategy to utilize longitudinal quantitative trait data from the EHR at Geisinger Health System focusing on outpatient metabolic and complete blood panel data as a starting point. Comprehensive Metabolic Panel (CMP) as well as Complete Blood Counts (CBC) are parts of routine care and provide a comprehensive picture from high level screening of patients' overall health and disease. We randomly split our data into two datasets to allow for discovery and replication. We first conducted a genome-wide association study (GWAS) with median values of 25 different clinical laboratory measurements to identify variants from Human Omni Express Exome beadchip data that are associated with these measurements. We identified 687 variants that associated and replicated with the tested clinical measurements at p<5×10-08. Since longitudinal data from the EHR provides a record of a patient's medical history, we utilized this information to further investigate the ICD-9 codes that might be associated with differences in variability of the measurements in the longitudinal dataset. We identified low and high variance patients by looking at changes within their individual longitudinal EHR laboratory results for each of the 25 clinical lab values (thus creating 50 groups - a high variance and a low variance for each lab variable). We then performed a PheWAS analysis with ICD-9 diagnosis codes, separately in the high variance group and the low variance group for each lab variable. We found 717 PheWAS associations that replicated at a p-value less than 0.001. Next, we evaluated the results of this study by comparing the association results between the high and low variance groups. For example, we found 39 SNPs (in multiple genes) associated with ICD-9 250.01 (Type-I diabetes) in patients with high variance of plasma glucose levels, but not in patients with low variance in plasma glucose levels. Another example is the association of 4 SNPs in UMOD with chronic kidney disease in patients with high variance for aspartate aminotransferase (discovery p-value: 8.71×10-09 and replication p-value: 2.03×10-06). In general, we see a pattern of many more statistically significant associations from patients with high variance in the quantitative lab variables, in comparison with the low variance group across all of the 25 laboratory measurements. This study is one of the first of its kind to utilize quantitative trait variance from longitudinal laboratory data to find associations among genetic variants and clinical phenotypes obtained from an EHR, integrating laboratory values and diagnosis codes to understand the genetic complexities of common diseases.

摘要

电子健康记录(EHR)中记录了广泛的患者健康数据。这些数据包括诊断、外科手术、临床实验室测量结果和用药信息。这些信息共同反映了患者的病史。许多研究通过利用国际疾病分类第9版(ICD - 9)编码或实验室测量结果,或者通过设计表型算法从EHR中准确提取病例和对照状态,有效地利用了EHR中的这些数据来发现具有临床相关性的关联。在此,我们制定了一项策略,以盖辛格健康系统的EHR中的纵向定量性状数据为重点,将门诊代谢和全血细胞计数数据作为起点。综合代谢指标(CMP)以及全血细胞计数(CBC)是常规护理的一部分,可从对患者整体健康和疾病的高级筛查中提供全面情况。我们将数据随机分为两个数据集,以便进行发现和复制。我们首先对25种不同临床实验室测量结果的中位数进行全基因组关联研究(GWAS),以从人类全外显子表达芯片数据中识别与这些测量结果相关的变异。我们识别出687个与测试临床测量结果相关且在p<5×10 - 08时可复制的变异。由于EHR中的纵向数据提供了患者病史记录,我们利用此信息进一步研究纵向数据集中可能与测量变异性差异相关的ICD - 9编码。我们通过查看25个临床实验室值中每个值在其个体纵向EHR实验室结果中的变化来识别高变异和低变异患者(从而创建50个组 - 每个实验室变量一个高变异组和一个低变异组)。然后,我们针对每个实验室变量,分别在高变异组和低变异组中使用ICD - 9诊断编码进行表型 - 全基因组关联研究(PheWAS)分析。我们发现717个在p值小于0.001时可复制的PheWAS关联。接下来,我们通过比较高变异组和低变异组之间的关联结果来评估本研究的结果。例如,我们发现39个单核苷酸多态性(SNP,存在于多个基因中)与血浆葡萄糖水平高变异患者的ICD - 9 250.01(I型糖尿病)相关,但与血浆葡萄糖水平低变异患者无关。另一个例子是,在天冬氨酸转氨酶高变异患者中,UMOD基因中的4个SNP与慢性肾病相关(发现p值:8.71×10 - 09,复制p值:2.03×10 - 06)。总体而言,与低变异组相比,在所有25项实验室测量中,我们从定量实验室变量高变异患者中看到了更多具有统计学意义的关联模式。本研究是同类研究中首批利用纵向实验室数据的定量性状变异来发现从EHR获得的遗传变异与临床表型之间的关联,整合实验室值和诊断编码以了解常见疾病遗传复杂性的研究之一。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验