Hampers Nicholas, Thieme Rita, Hampers Louis
Department of Sociology, Colorado Springs, University of Colorado, CO.
University of Colorado School of Medicine, Aurora, CO.
Pediatr Emerg Care. 2025 Jun 1;41(6):481-485. doi: 10.1097/PEC.0000000000003369. Epub 2025 Mar 4.
We evaluated the accuracy of an artificial intelligence program (ChatGPT 4.0) as a medical translation modality in a simulated pediatric urgent care setting.
Two entirely separate instances of ChatGPT 4.0 were used. The first served as a simulated patient (SP). The SP generated complaints and symptoms while processing and generating text only in Spanish. A human provider (blinded to diagnosis) conducted a clinical "visit" with the SP. The provider typed questions and instructions in English only. A second instance of ChatGPT 4.0 was the artificial medical interpreter (AMI). The AMI translated the provider's questions/instructions from English to Spanish and the SP's responses/concerns from Spanish to English in real time. Post-visit transcripts were then reviewed for errors by a human-certified medical interpreter.
We conducted 10 simulated visits with 3597 words translated by the AMI (1331 English and 2266 Spanish). There were 23 errors (raw accuracy rate of 99.4%). Errors were categorized as: 9 omissions, 2 additions, 11 substitutions, and 1 editorialization. Three errors were judged to have potential clinical consequences, although these were minor ambiguities, readily resolved by the provider during the visit. Also, the AMI made repeated errors of gender (masculine/feminine) and second person formality ("usted"/"tu"). None of these were judged to have potential clinical consequences.
The AMI accurately and safely translated the written content of simulated urgent care visits. It may serve as the basis for an expedient, cost-effective medical interpreter modality. Further work should seek to couple this translation accuracy with speech recognition and generative technology in trials with actual patients.
我们评估了人工智能程序(ChatGPT 4.0)在模拟儿科紧急护理环境中作为医学翻译方式的准确性。
使用了两个完全独立的ChatGPT 4.0实例。第一个作为模拟患者(SP)。该模拟患者生成主诉和症状,同时仅处理和生成西班牙语文本。一名不知情的医疗服务提供者与该模拟患者进行临床“问诊”。该提供者仅用英语输入问题和指示。ChatGPT 4.0的第二个实例是人工医学口译员(AMI)。该人工医学口译员实时将提供者的问题/指示从英语翻译成西班牙语,并将模拟患者的回答/担忧从西班牙语翻译成英语。随后,由经过认证的人工医学口译员对问诊后的文字记录进行错误检查。
我们进行了10次模拟问诊,人工医学口译员翻译了3597个单词(1331个英语单词和2266个西班牙语单词)。共有23处错误(原始准确率为99.4%)。错误分类如下:9处遗漏、2处添加、11处替换和1处编辑。有3处错误被判定可能产生临床后果,不过这些都是轻微的歧义,在问诊过程中提供者很容易解决。此外,人工医学口译员在性别(阳性/阴性)和第二人称形式(“usted”/“tu”)方面反复出错。这些都未被判定可能产生临床后果。
人工医学口译员准确、安全地翻译了模拟紧急护理问诊的书面内容。它可作为一种便捷、经济高效的医学口译方式的基础。进一步的工作应致力于在实际患者试验中将这种翻译准确性与语音识别和生成技术相结合。