Quttainah Majdi, Mishra Vinaytosh, Madakam Somayya, Lurie Yotam, Mark Shlomo
College of Business Administration, Kuwait University, Kuwait, Kuwait.
College of Healthcare Management and Economics, Gulf Medical University, Ajman, United Arab Emirates.
JMIR AI. 2024 Apr 23;3:e51834. doi: 10.2196/51834.
The world has witnessed increased adoption of large language models (LLMs) in the last year. Although the products developed using LLMs have the potential to solve accessibility and efficiency problems in health care, there is a lack of available guidelines for developing LLMs for health care, especially for medical education.
The aim of this study was to identify and prioritize the enablers for developing successful LLMs for medical education. We further evaluated the relationships among these identified enablers.
A narrative review of the extant literature was first performed to identify the key enablers for LLM development. We additionally gathered the opinions of LLM users to determine the relative importance of these enablers using an analytical hierarchy process (AHP), which is a multicriteria decision-making method. Further, total interpretive structural modeling (TISM) was used to analyze the perspectives of product developers and ascertain the relationships and hierarchy among these enablers. Finally, the cross-impact matrix-based multiplication applied to a classification (MICMAC) approach was used to determine the relative driving and dependence powers of these enablers. A nonprobabilistic purposive sampling approach was used for recruitment of focus groups.
The AHP demonstrated that the most important enabler for LLMs was credibility, with a priority weight of 0.37, followed by accountability (0.27642) and fairness (0.10572). In contrast, usability, with a priority weight of 0.04, showed negligible importance. The results of TISM concurred with the findings of the AHP. The only striking difference between expert perspectives and user preference evaluation was that the product developers indicated that cost has the least importance as a potential enabler. The MICMAC analysis suggested that cost has a strong influence on other enablers. The inputs of the focus group were found to be reliable, with a consistency ratio less than 0.1 (0.084).
This study is the first to identify, prioritize, and analyze the relationships of enablers of effective LLMs for medical education. Based on the results of this study, we developed a comprehendible prescriptive framework, named CUC-FATE (Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability), for evaluating the enablers of LLMs in medical education. The study findings are useful for health care professionals, health technology experts, medical technology regulators, and policy makers.
去年,大语言模型(LLMs)在全球范围内的应用日益广泛。尽管基于大语言模型开发的产品有潜力解决医疗保健领域的可及性和效率问题,但目前缺乏针对医疗保健领域,尤其是医学教育领域开发大语言模型的可用指南。
本研究旨在确定成功开发用于医学教育的大语言模型的推动因素,并对其进行优先级排序。我们还进一步评估了这些已确定的推动因素之间的关系。
首先对现有文献进行叙述性综述,以确定大语言模型开发的关键推动因素。我们还收集了大语言模型用户的意见,使用层次分析法(AHP,一种多标准决策方法)来确定这些推动因素的相对重要性。此外,运用总体解释结构建模(TISM)来分析产品开发者的观点,并确定这些推动因素之间的关系和层次结构。最后,使用基于交叉影响矩阵的乘法应用于分类法(MICMAC)来确定这些推动因素的相对驱动和依赖力量。采用非概率目的抽样方法招募焦点小组。
层次分析法表明,大语言模型最重要的推动因素是可信度,优先级权重为0.37,其次是问责制(0.27642)和公平性(0.10572)。相比之下,可用性的优先级权重为0.04,重要性可忽略不计。总体解释结构建模的结果与层次分析法的结果一致。专家观点与用户偏好评估之间唯一显著的差异是,产品开发者表示成本作为潜在推动因素的重要性最低。MICMAC分析表明,成本对其他推动因素有很大影响。焦点小组的意见被认为是可靠的,一致性比率小于0.1(0.084)。
本研究首次确定、排序并分析了有效用于医学教育的大语言模型的推动因素之间的关系。基于本研究结果,我们开发了一个名为CUC-FATE(成本、可用性、可信度、公平性、问责制、透明度和可解释性)的可理解的规范性框架,用于评估医学教育中使用的大语言模型的推动因素。研究结果对医疗保健专业人员、健康技术专家、医疗技术监管机构和政策制定者有用。