Dai Jiayi, Kim Mi-Young, Sutton Reed T, Mitchell J Ross, Goebel Randolph, Baumgart Daniel C
College of Health Sciences, University of Alberta, Edmonton, AB, Canada.
College of Natural and Applied Sciences, University of Alberta, Edmonton, AB, Canada.
NPJ Digit Med. 2025 May 30;8(1):324. doi: 10.1038/s41746-025-01729-5.
Imaging is crucial to assess disease extent, activity, and outcomes in inflammatory bowel disease (IBD). Artificial intelligence (AI) image interpretation requires automated exploitation of studies at scale as an initial step. Here we evaluate natural language processing to classify Crohn's disease (CD) on CTE. From our population representative IBD registry a sample of CD patients (male: 44.6%, median age: 50 IQR37-60) and controls (n = 981 each) CTE reports were extracted and split into training- (n = 1568), development- (n = 196), and testing (n = 198) datasets each with around 200 words and balanced numbers of labels, respectively. Predictive classification was evaluated with CNN, Bi-LSTM, BERT-110M, LLaMA-3.3-70B-Instruct and DeepSeek-R1-Distill-LLaMA-70B. While our custom IBDBERT finetuned on expert IBD knowledge (i.e. ACG, AGA, ECCO guidelines), outperformed rule- and rationale extraction-based classifiers (accuracy 88.6% with pre-tuning learning rate 0.00001, AUC 0.945) in predictive performance, LLaMA, but not DeepSeek achieved overall superior results (accuracy 91.2% vs. 88.9%, F1 0.907 vs. 0.874).
影像学对于评估炎症性肠病(IBD)的疾病范围、活动度和预后至关重要。人工智能(AI)图像解读需要首先大规模自动利用研究数据。在此,我们评估自然语言处理在CTE上对克罗恩病(CD)进行分类的情况。从我们具有人群代表性的IBD登记处提取了CD患者(男性占44.6%,中位年龄:50岁,四分位间距37 - 60岁)和对照组(每组n = 981)的CTE报告,并分别分为训练集(n = 1568)、开发集(n = 196)和测试集(n = 198),每个数据集约有200个单词且标签数量均衡。使用卷积神经网络(CNN)、双向长短期记忆网络(Bi - LSTM)、BERT - 110M、LLaMA - 3.3 - 70B - Instruct和DeepSeek - R1 - Distill - LLaMA - 70B评估预测分类。虽然我们基于专家IBD知识(即美国胃肠病学会(ACG)、美国胃肠病协会(AGA)、欧洲克罗恩病和结肠炎组织(ECCO)指南)微调的自定义IBDBERT在预测性能上优于基于规则和原理提取的分类器(预调学习率0.00001时准确率88.6%,曲线下面积(AUC)0.945),但LLaMA而非DeepSeek取得了总体更优的结果(准确率91.2%对88.9%,F1值0.907对0.874)。