Institute of Cardiovascular Sciences, University College London, London, United Kingdom.
NIHR University College London Biomedical Research Centre, University College London and University College London Hospitals NHS Foundation Trust, London, United Kingdom.
PLoS One. 2022 Apr 5;17(4):e0264828. doi: 10.1371/journal.pone.0264828. eCollection 2022.
A lack of internationally agreed standards for combining available data sources at scale risks inconsistent disease phenotyping limiting research reproducibility.
To develop and then evaluate if a rules-based algorithm can identify coronary artery disease (CAD) sub-phenotypes using electronic health records (EHR) and questionnaire data from UK Biobank (UKB).
Case-control and cohort study.
Prospective cohort study of 502K individuals aged 40-69 years recruited between 2006-2010 into the UK Biobank with linked hospitalization and mortality data and genotyping.
We included all individuals for phenotyping into 6 predefined CAD phenotypes using hospital admission and procedure codes, mortality records and baseline survey data. Of these, 408,470 unrelated individuals of European descent had a polygenic risk score (PRS) for CAD estimated.
CAD Phenotypes.
Association with baseline risk factors, mortality (n = 14,419 over 7.8 years median f/u), and a PRS for CAD.
The algorithm classified individuals with CAD into prevalent MI (n = 4,900); incident MI (n = 4,621), prevalent CAD without MI (n = 10,910), incident CAD without MI (n = 8,668), prevalent self-reported MI (n = 2,754); prevalent self-reported CAD without MI (n = 5,623), yielding 37,476 individuals with any type of CAD. Risk factors were similar across the six CAD phenotypes, except for fewer men in the self-reported CAD without MI group (46.7% v 70.1% for the overall group). In age- and sex- adjusted survival analyses, mortality was highest following incident MI (HR 6.66, 95% CI 6.07-7.31) and lowest for prevalent self-reported CAD without MI at baseline (HR 1.31, 95% CI 1.15-1.50) compared to disease-free controls. There were similar graded associations across the six phenotypes per SD increase in PRS, with the strongest association for prevalent MI (OR 1.50, 95% CI 1.46-1.55) and the weakest for prevalent self-reported CAD without MI (OR 1.08, 95% CI 1.05-1.12). The algorithm is available in the open phenotype HDR UK phenotype library (https://portal.caliberresearch.org/).
An algorithmic, EHR-based approach distinguished six phenotypes of CAD with distinct survival and PRS associations, supporting adoption of open approaches to help standardize CAD phenotyping and its wider potential value for reproducible research in other conditions.
缺乏用于大规模整合现有数据源的国际公认标准,可能会导致疾病表型不一致,从而限制研究的可重复性。
开发一种基于规则的算法,并评估其是否可以使用电子健康记录 (EHR) 和英国生物库 (UKB) 的问卷调查数据来识别冠心病 (CAD) 的亚表型。
病例对照和队列研究。
前瞻性队列研究,纳入了 2006-2010 年间招募的 502K 名年龄在 40-69 岁之间的个体,这些个体均有与住院和死亡数据以及基因分型相关联的 EHR。
我们使用住院和手术代码、死亡记录和基线调查数据,将所有个体分为 6 种预先定义的 CAD 表型进行表型分析。其中,408470 名无亲缘关系的欧洲血统个体的 CAD 多基因风险评分 (PRS) 被估算。
CAD 表型。
与基线风险因素、死亡率(n = 14419 人,中位随访 7.8 年)和 CAD 的 PRS 的相关性。
该算法将 CAD 患者分为:现患心梗(n = 4900)、新发心梗(n = 4621)、现患无心梗的 CAD(n = 10910)、新发无心梗的 CAD(n = 8668)、现患自述心梗(n = 2754)、现患自述无心梗的 CAD(n = 5623),共有 37476 名个体患有任何类型的 CAD。在六个 CAD 表型中,除了自述无心梗 CAD 组的男性比例较低(整体组为 70.1%,自述无心梗 CAD 组为 46.7%)外,其他风险因素相似。在年龄和性别调整的生存分析中,与无疾病对照组相比,新发心梗后的死亡率最高(HR 6.66,95%CI 6.07-7.31),而基线时现患自述无心梗的 CAD 死亡率最低(HR 1.31,95%CI 1.15-1.50)。在每个 PRS 标准差的递增中,六个表型都存在类似的等级关联,其中现患心梗的关联最强(OR 1.50,95%CI 1.46-1.55),而现患自述无心梗的 CAD 最弱(OR 1.08,95%CI 1.05-1.12)。该算法可在开放的表型 HDR UK 表型库(https://portal.caliberresearch.org/)中获得。
基于 EHR 的算法区分了具有不同生存和 PRS 关联的六种 CAD 表型,支持采用开放方法来帮助标准化 CAD 表型及其在其他条件下进行可重复研究的更大潜在价值。