Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
Sci Transl Med. 2011 Apr 20;3(79):79re1. doi: 10.1126/scitranslmed.3001807.
Clinical data in electronic medical records (EMRs) are a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network (eMERGE) investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome-wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73 to 98% and negative predictive values of 98 to 100%. Most EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
电子病历(EMR)中的临床数据是研究中纵向临床数据的潜在来源。电子病历与基因组学网络(eMERGE)研究了通过常规临床护理使用 EMR 捕获的数据是否可以识别具有足够阳性和阴性预测值的疾病表型,以便用于全基因组关联研究(GWAS)。我们使用来自五个不同 EMR 数据集的数据,已经确定了五个疾病表型,其阳性预测值为 73%至 98%,阴性预测值为 98%至 100%。大多数 EMR 以结构化格式捕获了用于定义表型的关键信息(诊断、药物、实验室检查)。我们确定自然语言处理是提高病例识别率的重要工具。增加互操作 EMR 实施的努力和激励措施将显著提高基因组学研究中临床数据的可用性。