Department of Population Health Sciences, King's College London, London, Greater London, SE1 1UL, United Kingdom.
Department of Biostatistics and Health Informatics, King's College London, London, Greater London, SE5 8AB, United Kingdom.
J Am Med Inform Assoc. 2024 Apr 3;31(4):1009-1024. doi: 10.1093/jamia/ocae015.
OBJECTIVES: Question answering (QA) systems have the potential to improve the quality of clinical care by providing health professionals with the latest and most relevant evidence. However, QA systems have not been widely adopted. This systematic review aims to characterize current medical QA systems, assess their suitability for healthcare, and identify areas of improvement. MATERIALS AND METHODS: We searched PubMed, IEEE Xplore, ACM Digital Library, ACL Anthology, and forward and backward citations on February 7, 2023. We included peer-reviewed journal and conference papers describing the design and evaluation of biomedical QA systems. Two reviewers screened titles, abstracts, and full-text articles. We conducted a narrative synthesis and risk of bias assessment for each study. We assessed the utility of biomedical QA systems. RESULTS: We included 79 studies and identified themes, including question realism, answer reliability, answer utility, clinical specialism, systems, usability, and evaluation methods. Clinicians' questions used to train and evaluate QA systems were restricted to certain sources, types and complexity levels. No system communicated confidence levels in the answers or sources. Many studies suffered from high risks of bias and applicability concerns. Only 8 studies completely satisfied any criterion for clinical utility, and only 7 reported user evaluations. Most systems were built with limited input from clinicians. DISCUSSION: While machine learning methods have led to increased accuracy, most studies imperfectly reflected real-world healthcare information needs. Key research priorities include developing more realistic healthcare QA datasets and considering the reliability of answer sources, rather than merely focusing on accuracy.
目的:问答 (QA) 系统有可能通过为医疗保健专业人员提供最新和最相关的证据来提高临床护理质量。然而,QA 系统尚未得到广泛采用。本系统评价旨在描述当前的医学 QA 系统,评估它们在医疗保健中的适用性,并确定改进的领域。
材料与方法:我们于 2023 年 2 月 7 日在 PubMed、IEEE Xplore、ACM Digital Library、ACL 文集以及参考文献的前向和后向引用中进行了搜索。我们纳入了描述生物医学 QA 系统设计和评估的同行评审期刊和会议论文。两名审查员筛选了标题、摘要和全文文章。我们对每项研究进行了叙述性综合和偏倚风险评估。我们评估了生物医学 QA 系统的实用性。
结果:我们共纳入 79 项研究,确定了以下主题,包括问题真实性、答案可靠性、答案实用性、临床专业知识、系统、可用性和评估方法。用于培训和评估 QA 系统的临床医生问题仅限于某些来源、类型和复杂程度。没有系统传达答案或来源的置信度。许多研究存在较高的偏倚风险和适用性问题。只有 8 项研究完全满足临床实用性的任何标准,只有 7 项研究报告了用户评估。大多数系统都是在临床医生的有限输入下构建的。
讨论:虽然机器学习方法提高了准确性,但大多数研究并未完美反映现实世界的医疗保健信息需求。关键的研究重点包括开发更真实的医疗保健 QA 数据集,并考虑答案来源的可靠性,而不仅仅是关注准确性。
J Am Med Inform Assoc. 2024-4-3
Cochrane Database Syst Rev. 2022-2-1
Early Hum Dev. 2020-11
BMC Bioinformatics. 2019-10-22
Cochrane Database Syst Rev. 2021-7-27
PeerJ Comput Sci. 2023-10-20
Evid Rep Technol Assess (Full Rep). 2011-4
Bioengineering (Basel). 2025-8-21
AMIA Annu Symp Proc. 2025-5-22
JAMA Netw Open. 2023-9-5
J Biomed Inform. 2023-10
Nature. 2023-8
J Am Med Inform Assoc. 2023-1-18
IEEE/ACM Trans Comput Biol Bioinform. 2023
Bioinformatics. 2022-8-2
Proc Conf Empir Methods Nat Lang Process. 2021-11
BMC Bioinformatics. 2022-6-2
IEEE/ACM Trans Comput Biol Bioinform. 2023
BMC Bioinformatics. 2022-4-21