He Ting, Belouali Anas, Patricoski Jessica, Lehmann Harold, Ball Robert, Anagnostou Valsamo, Kreimeyer Kory, Botsis Taxiarchis
Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
J Biomed Inform. 2023 Apr;140:104335. doi: 10.1016/j.jbi.2023.104335. Epub 2023 Mar 16.
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
识别符合特定表型标准的患者队列在生物医学中至关重要,在精准医学中尤为迫切。许多研究团队提供了自动化流程,可从一个或多个来源自动检索和分析数据元素,以实现这一任务自动化,并提供高性能的可计算表型。我们基于系统评价和Meta分析的首选报告项目指南,采用系统方法对可计算临床表型进行全面的范围综述。使用一个结合了自动化、临床背景和表型分析概念的查询语句搜索了五个数据库。随后,四名评审员筛选了7960条记录(去除4000多条重复记录后),并选择了139条符合纳入标准的记录。对该数据集进行分析,以提取有关目标用例、数据相关主题、表型分析方法、评估策略以及已开发解决方案的可移植性的信息。大多数研究支持患者队列选择,但未讨论其在精准医学等特定用例中的应用。电子健康记录是所有研究中87.1%(N = 121)的主要数据来源,国际疾病分类编码在所有研究的55.4%(N = 77)中被大量使用,然而,只有25.9%(N = 36)的记录描述了符合通用数据模型的情况。就所提出的方法而言,传统机器学习(ML)是主导方法,通常与自然语言处理和其他方法相结合,而在许多情况下追求可计算表型的外部验证和可移植性。这些发现表明,精确界定目标用例、摆脱单一的机器学习策略以及在实际环境中评估所提出的解决方案是未来工作的重要机遇。支持临床和流行病学研究以及精准医学的可计算表型分析也有发展势头且需求不断涌现。