Northwestern University, Chicago, Illinois, USA.
J Am Med Inform Assoc. 2012 Mar-Apr;19(2):212-8. doi: 10.1136/amiajnl-2011-000439. Epub 2011 Nov 19.
Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype-phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems.
An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions.
The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D.
By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS.
An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.
全基因组关联研究(GWAS)需要高度特异性和大量的研究对象,以准确识别基因型-表型相关性。本研究的目的是使用五个机构的常规临床护理中捕获的数据,通过不同的电子病历(EMR)系统,为 GWAS 确定 2 型糖尿病(T2D)病例和对照。
开发了一种算法,该算法基于诊断、药物和实验室结果的组合来识别 T2D 病例和对照。该算法在五个参与机构中的三个机构与临床医生的审查进行了验证。随后,使用该算法识别的病例和对照进行了 GWAS,样本来自所有五个机构。
与临床医生的审查相比,该算法对糖尿病病例和对照的识别具有 98%和 100%的阳性预测值。通过在机构之间标准化和应用该算法,确定了 3353 例病例和 3352 例对照。随后在五个机构使用数据进行的 GWAS 复制了先前与 T2D 相关的 TCF7L2 基因变异(rs7903146)。
通过对常规临床护理中收集的 EMR 数据应用严格的标准,确定了 GWAS 的病例和对照,随后复制了已知的遗传变异。使用标准术语来定义数据元素,使五个不同机构的研究对象和数据可以汇集在一起,以实现 GWAS 所需的大量数据。
一种使用五个不同 EMR 中的常用数据的算法可以准确地识别 T2D 病例和对照,以进行多个机构的遗传研究。