使用社区水平风险因素与临床数据相结合预测临床风险和推荐干预措施的方法的系统评价。
Systematic review of approaches to use of neighborhood-level risk factors with clinical data to predict clinical risk and recommend interventions.
机构信息
Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65212, United States; School of Medicine, University of Missouri, Columbia, MO 65212, United States.
Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65212, United States; School of Medicine, University of Missouri, Columbia, MO 65212, United States.
出版信息
J Biomed Inform. 2021 Apr;116:103713. doi: 10.1016/j.jbi.2021.103713. Epub 2021 Feb 18.
BACKGROUND
Despite a large body of literature investigating how the environment influences health outcomes, most published work to date includes only a limited subset of the rich clinical and environmental data that is available and does not address how these data might best be used to predict clinical risk or expected impact of clinical interventions.
OBJECTIVE
Identify existing approaches to inclusion of a broad set of neighborhood-level risk factors with clinical data to predict clinical risk and recommend interventions.
METHODS
A systematic review of scientific literature published and indexed in PubMed, Web of Science, Association of Computing Machinery (ACM) and SCOPUS from 2010 through October 2020 was performed. To be included, articles had to include search terms related to Electronic Health Record (EHR) data Neighborhood-Level Risk Factors (NLRFs), and Machine Learning (ML) Methods. Citations of relevant articles were also reviewed for additional articles for inclusion. Articles were reviewed and coded by two independent reviewers to capture key information including data sources, linkage of EHR to NRLFs, methods, and results. Articles were assessed for quality using a modified Quality Assessment Tool for Systematic Reviews of Observational Studies (QATSO).
RESULTS
A total of 334 articles were identified for abstract review. 36 articles were identified for full review with 19 articles included in the final analysis. All but two of the articles included socio-demographic data derived from the U.S. Census and we found great variability in sources of NLRFs beyond the Census. The majority or the articles (14 of 19) included broader clinical (e.g. medications, labs and co-morbidities) and demographic information about the individual from the EHR in addition to the clinical outcome variable. Half of the articles (10) had a stated goal to predict the outcome(s) of interest. While results of the studies reinforced the correlative association of NLRFs to clinical outcomes, only one article found that adding NLRFs into a model with other data added predictive power with the remainder concluding either that NLRFs were of mixed value depending on the model and outcome or that NLRFs added no predictive power over other data in the model. Only one article scored high on the quality assessment with 13 scoring moderate and 4 scoring low.
CONCLUSIONS
In spite of growing interest in combining NLRFs with EHR data for clinical prediction, we found limited evidence that NLRFs improve predictive power in clinical risk models. We found these data and methods are being used in four ways. First, early approaches to include broad NLRFs to predict clinical risk primarily focused on dimension reduction for feature selection or as a data preparation step to input into regression analysis. Second, more recent work incorporates NLRFs into more advanced predictive models, such as Neural Networks, Random Forest, and Penalized Lasso to predict clinical outcomes or predict value of interventions. Third, studies that test how inclusion of NLRFs predict clinical risk have shown mixed results regarding the value of these data over EHR or claims data alone and this review surfaced evidence of potential quality challenges and biases inherent to this approach. Finally, NLRFs were used with unsupervised learning to identify underlying patterns in patient populations to recommend targeted interventions. Further access to computable, high quality data is needed along with careful study design, including sub-group analysis, to better determine how these data and methods can be used to support decision making in a clinical setting.
背景
尽管有大量文献研究环境如何影响健康结果,但迄今为止发表的大多数研究仅包含有限的临床和环境数据子集,并且没有解决如何最好地利用这些数据来预测临床风险或预期的临床干预效果。
目的
确定将广泛的社区级风险因素纳入临床数据以预测临床风险和推荐干预措施的现有方法。
方法
对 2010 年至 2020 年 10 月期间在 PubMed、Web of Science、计算机协会 (ACM) 和 Scopus 上发表和索引的科学文献进行了系统回顾。纳入的文章必须包括与电子健康记录 (EHR) 数据社区级风险因素 (NLRFs) 和机器学习 (ML) 方法相关的搜索词。还查阅了相关文章的引文,以纳入其他文章。由两名独立审稿人对文章进行审查和编码,以获取关键信息,包括数据来源、EHR 与 NLRFs 的链接、方法和结果。使用修改后的观察性研究系统评价质量评估工具 (QATSO) 对文章进行质量评估。
结果
共检索到 334 篇文章进行摘要审查。确定了 36 篇全文审查文章,其中 19 篇文章纳入最终分析。除了两篇文章外,所有文章都包含了来自美国人口普查的社会人口统计学数据,我们发现除了人口普查之外,社区级风险因素的来源存在很大的差异。大多数文章(19 篇中的 14 篇)除了临床结果变量外,还包括来自 EHR 的更广泛的临床(例如药物、实验室和合并症)和个人人口统计学信息。一半的文章(10 篇)有一个明确的目标,即预测感兴趣的结果。尽管这些研究的结果强化了 NLRFs 与临床结果的相关性,但只有一篇文章发现,将 NLRFs 加入到包含其他数据的模型中可以增加预测能力,其余文章的结论是,NLRFs 的价值取决于模型和结果,或者 NLRFs 对模型中的其他数据没有增加预测能力。只有一篇文章在质量评估中得分较高,13 篇得分中等,4 篇得分较低。
结论
尽管人们对将 NLRFs 与 EHR 数据结合用于临床预测越来越感兴趣,但我们发现有限的证据表明 NLRFs 可以提高临床风险模型的预测能力。我们发现这些数据和方法正在以四种方式使用。首先,早期纳入广泛的 NLRFs 以预测临床风险的方法主要侧重于特征选择的维度减少,或作为输入回归分析的数据准备步骤。其次,最近的工作将 NLRFs 纳入更先进的预测模型,如神经网络、随机森林和惩罚套索,以预测临床结果或预测干预措施的价值。第三,测试纳入 NLRFs 如何预测临床风险的研究表明,这些数据在 EHR 或索赔数据方面的价值存在混合结果,本综述揭示了这种方法固有的潜在质量挑战和偏见的证据。最后,NLRFs 用于无监督学习,以识别患者群体中的潜在模式,从而推荐有针对性的干预措施。需要进一步获得可计算的高质量数据,并进行精心的研究设计,包括亚组分析,以更好地确定如何利用这些数据和方法来支持临床环境中的决策。