Joseph Anika, Joseph Kevin, Joseph Angelyn
Health Sciences Program, University of Ottawa, 75 Laurier Ave E, Ottawa, ON K1N 6N5, Canada.
Biomedical Science Program, University of Ottawa, 75 Laurier Ave E, Ottawa, ON K1N 6N5, Canada.
Transl Neurosci. 2024 Dec 24;15(1):20220361. doi: 10.1515/tnsci-2022-0361. eCollection 2024 Jan 1.
The limitation of artificial intelligence (AI) large language models to diagnose diseases from the perspective of patient safety remains underexplored and potential challenges, such as diagnostic errors and legal challenges, need to be addressed. To demonstrate the limitations of AI, we used ChatGPT-3.5 developed by OpenAI, as a tool for medical diagnosis using text-based case reports of multiple sclerosis (MS), which was selected as a prototypic disease. We analyzed 98 peer-reviewed case reports selected based on free-full text availability and published within the past decade (2014-2024), excluding any mention of an MS diagnosis to avoid bias. ChatGPT-3.5 was used to interpret clinical presentations and laboratory data from these reports. The model correctly diagnosed MS in 77 cases, achieving an accuracy rate of 78.6%. However, the remaining 21 cases were misdiagnosed, highlighting the model's limitations. Factors contributing to the errors include variability in data presentation and the inherent complexity of MS diagnosis, which requires imaging modalities in addition to clinical presentations and laboratory data. While these findings suggest that AI can support disease diagnosis and healthcare providers in decision-making, inadequate training with large datasets may lead to significant inaccuracies. Integrating AI into clinical practice necessitates rigorous validation and robust regulatory frameworks to ensure responsible use.
从患者安全的角度来看,人工智能(AI)大语言模型在疾病诊断方面的局限性仍未得到充分探索,需要解决诸如诊断错误和法律挑战等潜在问题。为了证明人工智能的局限性,我们使用了OpenAI开发的ChatGPT-3.5,作为一种利用多发性硬化症(MS)基于文本的病例报告进行医学诊断的工具,MS被选为一种典型疾病。我们分析了98份同行评审的病例报告,这些报告基于全文免费获取且在过去十年(2014 - 2024年)内发表,排除了任何提及MS诊断的内容以避免偏差。ChatGPT-3.5被用于解读这些报告中的临床表现和实验室数据。该模型在77例病例中正确诊断出MS,准确率为78.6%。然而,其余21例被误诊,凸显了该模型的局限性。导致错误的因素包括数据呈现的变异性以及MS诊断的固有复杂性,MS诊断除了临床表现和实验室数据外还需要影像学检查。虽然这些发现表明人工智能可以支持疾病诊断并帮助医疗服务提供者进行决策,但对大型数据集的训练不足可能会导致显著的不准确。将人工智能整合到临床实践中需要严格的验证和强大的监管框架,以确保其合理使用。