Knight Kathryn E, Honerlaw Jacqueline, Danciu Ioana, Linares Franciel, Ho Yuk-Lam, Gagnon David R, Rush Everett, Gaziano J Michael, Begoli Edmon, Cho Kelly
Oak Ridge National Laboratory, Oak Ridge, TN.
Division of Population Health and Data Science, Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:326-334. eCollection 2020.
Electronic health records (EHRs) provide a wealth of data for phenotype development in population health studies, and researchers invest considerable time to curate data elements and validate disease definitions. The ability to reproduce well-defined phenotypes increases data quality, comparability of results and expedites research. In this paper, we present a standardized approach to organize and capture phenotype definitions, resulting in the creation of an open, online repository of phenotypes. This resource captures phenotype development, provenance and process from the Million Veteran Program, a national mega-biobank embedded in the Veterans Health Administration (VHA). To ensure that the repository is searchable, extendable, and sustainable, it is necessary to develop both a proper digital catalog architecture and underlying metadata infrastructure to enable effective management of the data fields required to define each phenotype. Our methods provide a resource for VHA investigators and a roadmap for researchers interested in standardizing their phenotype definitions to increase portability.
电子健康记录(EHRs)为人群健康研究中的表型发展提供了丰富的数据,研究人员投入了大量时间来精心整理数据元素并验证疾病定义。重现定义明确的表型的能力可提高数据质量、结果的可比性并加快研究速度。在本文中,我们提出了一种标准化方法来组织和获取表型定义,从而创建一个开放的在线表型知识库。该资源记录了百万退伍军人计划(Million Veteran Program)的表型发展、出处和过程,该计划是嵌入退伍军人健康管理局(VHA)的一个国家级大型生物样本库。为确保该知识库可搜索、可扩展且可持续,有必要开发适当的数字目录架构和基础元数据基础设施,以有效管理定义每个表型所需的数据字段。我们的方法为VHA研究人员提供了一种资源,并为有兴趣标准化其表型定义以提高可移植性的研究人员提供了路线图。