Ioanovici Andrei-Constantin, Feier Andrei-Marian, Mărușteri Marius-Ștefan, Trâmbițaș-Miron Alina-Dia, Dobru Daniela-Ecaterina
Department M2-Complementary Functional Sciences, Medical Informatics and Biostatistics, George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Targu Mures, 540142 Targu Mures, Romania.
Department M4-Clinical Sciences, Orthopedics and Traumatology I, George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Targu Mures, 540139 Targu Mures, Romania.
J Pers Med. 2025 Jul 30;15(8):334. doi: 10.3390/jpm15080334.
: In routine practice, colonoscopy findings are saved as unstructured free text, limiting secondary use. Accurate named-entity recognition (NER) is essential to unlock these descriptions for quality monitoring, personalized medicine and research. We compared named-entity recognition (NER) models trained on real, synthetic, and mixed data to determine whether privacy preserving synthetic reports can boost clinical information extraction. : Three Spark NLP biLSTM CRF models were trained on (i) 100 manually annotated Romanian colonoscopy reports (ModelR), (ii) 100 prompt-generated synthetic reports (ModelS), and (iii) a 1:1 mix (ModelM). Performance was tested on 40 unseen reports (20 real, 20 synthetic) for seven entities. Micro-averaged precision, recall, and F1-score values were computed; McNemar tests with Bonferroni correction assessed pairwise differences. : ModelM outperformed single-source models (precision 0.95, recall 0.93, F1 0.94) and was significantly superior to ModelR (F1 0.70) and ModelS (F1 0.64; < 0.001 for both). ModelR maintained high accuracy on real text (F1 = 0.90), but its accuracy fell when tested on synthetic data (0.47); the reverse was observed for ModelS (F1 = 0.99 synthetic, 0.33 real). McNemar χ statistics (64.6 for ModelM vs. ModelR; 147.0 for ModelM vs. ModelS) greatly exceeded the Bonferroni-adjusted significance threshold (α = 0.0167), confirming that the observed performance gains were unlikely to be due to chance. : Synthetic colonoscopy descriptions are a valuable complement, but not a substitute for real annotations, while AI is helping human experts, not replacing them. Training on a balanced mix of real and synthetic data can help to obtain robust, generalizable NER models able to structure free-text colonoscopy reports, supporting large-scale, privacy-preserving colorectal cancer surveillance and personalized follow-up.
在常规实践中,结肠镜检查结果以非结构化的自由文本形式保存,限制了二次使用。准确的命名实体识别(NER)对于解锁这些描述以进行质量监测、个性化医疗和研究至关重要。我们比较了在真实数据、合成数据和混合数据上训练的命名实体识别(NER)模型,以确定隐私保护合成报告是否可以促进临床信息提取。:三个Spark NLP双向长短期记忆条件随机场(biLSTM CRF)模型分别在(i)100份人工注释的罗马尼亚结肠镜检查报告(模型R)、(ii)100份提示生成的合成报告(模型S)以及(iii)1:1混合数据(模型M)上进行训练。针对七个实体在40份未见报告(20份真实报告、20份合成报告)上测试性能。计算微观平均精度、召回率和F1分数值;采用Bonferroni校正的McNemar检验评估成对差异。:模型M优于单源模型(精度0.95,召回率0.93,F1 0.94),并且显著优于模型R(F1 0.70)和模型S(F1 0.64;两者均P<0.001)。模型R在真实文本上保持了较高的准确率(F1 = 0.90),但在合成数据上测试时准确率下降(0.47);模型S则相反(合成数据F1 = 0.99,真实数据F1 = 0.33)。McNemar卡方统计量(模型M与模型R比较为64.6;模型M与模型S比较为147.0)大大超过了Bonferroni调整后的显著性阈值(α = 0.0167),证实观察到的性能提升不太可能是偶然因素导致的。:合成结肠镜检查描述是一种有价值的补充,但不能替代真实注释,同时人工智能是在帮助人类专家,而不是取代他们。在真实数据和合成数据的平衡混合上进行训练有助于获得强大的、可推广的NER模型,能够构建自由文本结肠镜检查报告,支持大规模的、隐私保护的结直肠癌监测和个性化随访。