Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.
Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, Massachusetts, USA.
J Am Med Inform Assoc. 2021 Nov 25;28(12):2582-2592. doi: 10.1093/jamia/ocab187.
Large amounts of health data are becoming available for biomedical research. Synthesizing information across databases may capture more comprehensive pictures of patient health and enable novel research studies. When no gold standard mappings between patient records are available, researchers may probabilistically link records from separate databases and analyze the linked data. However, previous linked data inference methods are constrained to certain linkage settings and exhibit low power. Here, we present ATLAS, an automated, flexible, and robust association testing algorithm for probabilistically linked data.
Missing variables are imputed at various thresholds using a weighted average method that propagates uncertainty from probabilistic linkage. Next, estimated effect sizes are obtained using a generalized linear model. ATLAS then conducts the threshold combination test by optimally combining P values obtained from data imputed at varying thresholds using Fisher's method and perturbation resampling.
In simulations, ATLAS controls for type I error and exhibits high power compared to previous methods. In a real-world genetic association study, meta-analysis of ATLAS-enabled analyses on a linked cohort with analyses using an existing cohort yielded additional significant associations between rheumatoid arthritis genetic risk score and laboratory biomarkers.
Weighted average imputation weathers false matches and increases contribution of true matches to mitigate linkage error-induced bias. The threshold combination test avoids arbitrarily choosing a threshold to rule a match, thus automating linked data-enabled analyses and preserving power.
ATLAS promises to enable novel and powerful research studies using linked data to capitalize on all available data sources.
大量的健康数据可用于生物医学研究。在数据库之间综合信息可以更全面地了解患者的健康状况,并能够开展新的研究。当患者记录之间没有标准的映射关系时,研究人员可以通过概率链接来自不同数据库的记录,并对链接的数据进行分析。然而,以前的链接数据推断方法受到某些链接设置的限制,并且功效较低。本研究提出了 ATLAS,这是一种用于概率链接数据的自动、灵活和稳健的关联测试算法。
使用加权平均方法在各种阈值处对缺失变量进行插补,该方法从概率链接传播不确定性。接下来,使用广义线性模型获得估计的效应大小。然后,ATLAS 通过使用 Fisher 方法和扰动重采样最优地组合从在不同阈值处插补数据获得的 P 值来进行阈值组合检验。
在模拟中,ATLAS 控制了Ⅰ型错误,与以前的方法相比具有更高的功效。在一项真实的遗传关联研究中,对链接队列进行了 ATLAS 支持的分析的荟萃分析,并对使用现有队列的分析进行了荟萃分析,结果表明类风湿关节炎遗传风险评分与实验室生物标志物之间存在更多的显著关联。
加权平均插补可以减轻错误匹配的影响,增加真实匹配的贡献,从而减轻链接错误引起的偏差。阈值组合检验避免了任意选择阈值来判断匹配的情况,从而实现了链接数据驱动的分析自动化,并保持了功效。
ATLAS 有望通过使用链接数据来开展新的、强大的研究,从而充分利用所有可用的数据资源。