Brain Injury Centre, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Shanghai Institute of Head Trauma, Shanghai, China.
J Med Internet Res. 2024 Sep 26;26:e58741. doi: 10.2196/58741.
Cerebral hemorrhage is a critical medical condition that necessitates a rapid and precise diagnosis for timely medical intervention, including emergency operation. Computed tomography (CT) is essential for identifying cerebral hemorrhage, but its effectiveness is limited by the availability of experienced radiologists, especially in resource-constrained regions or when shorthanded during holidays or at night. Despite advancements in artificial intelligence-driven diagnostic tools, most require technical expertise. This poses a challenge for widespread adoption in radiological imaging. The introduction of advanced natural language processing (NLP) models such as GPT-4, which can annotate and analyze images without extensive algorithmic training, offers a potential solution.
This study investigates GPT-4's capability to identify and annotate cerebral hemorrhages in cranial CT scans. It represents a novel application of NLP models in radiological imaging.
In this retrospective analysis, we collected 208 CT scans with 6 types of cerebral hemorrhages at Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, between January and September 2023. All CT images were mixed together and sequentially numbered, so each CT image had its own corresponding number. A random sequence from 1 to 208 was generated, and all CT images were inputted into GPT-4 for analysis in the order of the random sequence. The outputs were subsequently examined using Photoshop and evaluated by experienced radiologists on a 4-point scale to assess identification completeness, accuracy, and success.
The overall identification completeness percentage for the 6 types of cerebral hemorrhages was 72.6% (SD 18.6%). Specifically, GPT-4 achieved higher identification completeness in epidural and intraparenchymal hemorrhages (89.0%, SD 19.1% and 86.9%, SD 17.7%, respectively), yet its identification completeness percentage in chronic subdural hemorrhages was very low (37.3%, SD 37.5%). The misidentification percentages for complex hemorrhages (54.0%, SD 28.0%), epidural hemorrhages (50.2%, SD 22.7%), and subarachnoid hemorrhages (50.5%, SD 29.2%) were relatively high, whereas they were relatively low for acute subdural hemorrhages (32.6%, SD 26.3%), chronic subdural hemorrhages (40.3%, SD 27.2%), and intraparenchymal hemorrhages (26.2%, SD 23.8%). The identification completeness percentages in both massive and minor bleeding showed no significant difference (P=.06). However, the misidentification percentage in recognizing massive bleeding was significantly lower than that for minor bleeding (P=.04). The identification completeness percentages and misidentification percentages for cerebral hemorrhages at different locations showed no significant differences (all P>.05). Lastly, radiologists showed relative acceptance regarding identification completeness (3.60, SD 0.54), accuracy (3.30, SD 0.65), and success (3.38, SD 0.64).
GPT-4, a standout among NLP models, exhibits both promising capabilities and certain limitations in the realm of radiological imaging, particularly when it comes to identifying cerebral hemorrhages in CT scans. This opens up new directions and insights for the future development of NLP models in radiology.
ClinicalTrials.gov NCT06230419; https://clinicaltrials.gov/study/NCT06230419.
脑出血是一种危急的医疗状况,需要快速、准确的诊断以便及时进行医疗干预,包括紧急手术。计算机断层扫描(CT)是识别脑出血的重要手段,但由于经验丰富的放射科医生资源有限,尤其是在资源匮乏的地区或节假日或夜间人手不足时,其效果受到限制。尽管人工智能驱动的诊断工具取得了进展,但大多数工具都需要技术专业知识。这在放射影像学的广泛应用中构成了挑战。引入先进的自然语言处理(NLP)模型,如 GPT-4,它可以在无需广泛算法训练的情况下对图像进行注释和分析,提供了一种潜在的解决方案。
本研究旨在调查 GPT-4 识别和标注颅 CT 扫描中脑出血的能力。这代表了 NLP 模型在放射影像学中的新应用。
在这项回顾性分析中,我们收集了 208 例上海交通大学医学院附属仁济医院 2023 年 1 月至 9 月期间的 6 种类型的脑出血 CT 扫描。所有 CT 图像混合在一起并按顺序编号,因此每个 CT 图像都有自己的编号。生成一个 1 到 208 的随机序列,然后按照随机序列将所有 CT 图像输入 GPT-4 进行分析。使用 Photoshop 检查输出,并由经验丰富的放射科医生对其进行 4 分制评估,以评估识别的完整性、准确性和成功率。
6 种类型脑出血的总体识别完整性百分比为 72.6%(SD 18.6%)。具体来说,GPT-4 在硬膜外和脑实质内出血中的识别完整性较高(89.0%,SD 19.1%和 86.9%,SD 17.7%),但慢性硬膜下血肿的识别完整性百分比非常低(37.3%,SD 37.5%)。复杂出血(54.0%,SD 28.0%)、硬膜外出血(50.2%,SD 22.7%)和蛛网膜下腔出血(50.5%,SD 29.2%)的误识别率较高,而急性硬膜下出血(32.6%,SD 26.3%)、慢性硬膜下血肿(40.3%,SD 27.2%)和脑实质内出血(26.2%,SD 23.8%)的误识别率较低。大量出血和少量出血的识别完整性百分比无显著差异(P=.06)。然而,识别大量出血的误识别率显著低于识别少量出血的误识别率(P=.04)。不同部位脑出血的识别完整性百分比和误识别率无显著差异(均 P>.05)。最后,放射科医生对识别的完整性(3.60,SD 0.54)、准确性(3.30,SD 0.65)和成功率(3.38,SD 0.64)的接受程度相对较高。
GPT-4 作为 NLP 模型中的佼佼者,在放射影像学领域展现出了令人瞩目的能力和一定的局限性,特别是在 CT 扫描中识别脑出血方面。这为 NLP 模型在放射学中的未来发展开辟了新的方向和思路。
ClinicalTrials.gov NCT06230419;https://clinicaltrials.gov/study/NCT06230419。