Suppr超能文献

电子病历表型细节对基因关联研究的影响:以高密度脂蛋白胆固醇为例

The effects of electronic medical record phenotyping details on genetic association studies: HDL-C as a case study.

作者信息

Dumitrescu Logan, Goodloe Robert, Bradford Yukiko, Farber-Eger Eric, Boston Jonathan, Crawford Dana C

机构信息

Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232 USA ; Department of Molecular Physiology and Biophysics, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232 USA.

Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 512 Wartik Laboratory, University Park, PA 16802 USA.

出版信息

BioData Min. 2015 May 6;8:15. doi: 10.1186/s13040-015-0048-2. eCollection 2015.

Abstract

BACKGROUND

Biorepositories linked to de-identified electronic medical records (EMRs) have the potential to complement traditional epidemiologic studies in genotype-phenotype studies of complex human diseases and traits. A major challenge in meeting this potential is the use of EMR-derived data to extract phenotypes and covariates for genetic association studies. Unlike traditional epidemiologic data, EMR-derived data are collected for clinical care and are therefore highly variable across patients. The variability of clinical data coupled with the challenges associated with searching unstructured clinical notes requires the development of algorithms to extract phenotypes for analysis. Given the number of possible algorithms that could be developed for any one EMR-derived phenotype, we explored here the impact algorithm decision logic has on genetic association study results for a single quantitative trait, high density lipoprotein cholesterol (HDL-C).

RESULTS

We used five different algorithms to extract HDL-C from African American subjects genotyped on the Illumina Metabochip (n = 11,519) as part of Epidemiologic Architecture for Genes Linked to Environment (EAGLE). Tests of association between HDL-C and genetic risk scores for HDL-C associated variants suggest that the genetic effect size does not vary substantially across the five HDL-C definitions.

CONCLUSIONS

These data collectively suggest that, at least for this quantitative trait, algorithm decision logic and phenotyping details do not appreciably impact genetic association study test statistics.

摘要

背景

与去识别化电子病历(EMR)相关联的生物样本库有潜力在复杂人类疾病和性状的基因型-表型研究中补充传统流行病学研究。实现这一潜力的一个主要挑战是利用EMR衍生数据提取用于基因关联研究的表型和协变量。与传统流行病学数据不同,EMR衍生数据是为临床护理而收集的,因此在患者之间高度可变。临床数据的变异性以及搜索非结构化临床记录相关的挑战需要开发算法来提取表型进行分析。鉴于针对任何一种EMR衍生表型都可能开发出大量算法,我们在此探讨算法决策逻辑对单一数量性状高密度脂蛋白胆固醇(HDL-C)的基因关联研究结果的影响。

结果

作为与环境相关基因的流行病学架构(EAGLE)的一部分,我们使用五种不同算法从在Illumina Metabochip上进行基因分型的非裔美国受试者(n = 11,519)中提取HDL-C。HDL-C与HDL-C相关变体的遗传风险评分之间的关联测试表明,在五种HDL-C定义中,遗传效应大小没有显著差异。

结论

这些数据共同表明,至少对于这个数量性状,算法决策逻辑和表型细节不会明显影响基因关联研究的检验统计量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/966d/4428098/10dd31824b29/13040_2015_48_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验