Suppr超能文献

基于BERT的法语CT报告自然语言处理分析:在肺栓塞阳性率测量中的应用

BERT-based natural language processing analysis of French CT reports: Application to the measurement of the positivity rate for pulmonary embolism.

作者信息

Jupin-Delevaux Émilien, Djahnine Aissam, Talbot François, Richard Antoine, Gouttard Sylvain, Mansuy Adeline, Douek Philippe, Si-Mohamed Salim, Boussel Loïc

机构信息

Radiology department, Hospices Civils de Lyon - HCL, Lyon, France.

CREATIS, Univ Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, UJM-Saint Etienne, CNRS, Inserm, CREATIS UMR 5220, U1294, Lyon, France.

出版信息

Res Diagn Interv Imaging. 2023 Mar 27;6:100027. doi: 10.1016/j.redii.2023.100027. eCollection 2023 Jun.

Abstract

RATIONALE AND OBJECTIVES

To develop a Natural Language Processing (NLP) method based on Bidirectional Encoder Representations from Transformers (BERT) adapted to French CT reports and to evaluate its performance to calculate the diagnostic yield of CT in patients with clinical suspicion of pulmonary embolism (PE).

MATERIALS AND METHODS

All the CT reports performed in our institution in 2019 (99,510 reports, training and validation dataset) and 2018 (94,559 reports, testing dataset) were included after anonymization. Two BERT-based NLP sentence classifiers were trained on 27.700, manually labeled, sentences from the training dataset. The first one aimed to classify the reports' sentences into three classes ("Non chest", "Healthy chest", and "Pathological chest" related sentences), the second one to classify the last class into eleven sub classes pathologies including "pulmonary embolism". F1-score was reported on the validation dataset. These NLP classifiers were then applied to requested CT reports for pulmonary embolism from the testing dataset. Sensitivity, specificity, and accuracy for detection of the presence of a pulmonary embolism were reported in comparison to human analysis of the reports.

RESULTS

The F1-score for the 3-Classes and 11-SubClasses classifiers was 0.984 and 0.985, respectively. 4,042 examinations from the testing dataset were requested for pulmonary embolism of which 641 (15.8%) were positively evaluated by radiologists. The sensitivity, specificity, and accuracy of the NLP network for identifying pulmonary embolism in these reports were 98.2%, 99.3% and 99.1%, respectively.

CONCLUSION

BERT-based NLP sentences classifier enables the analysis of large databases of radiological reports to accurately determine the diagnostic yield of CT screening.

摘要

原理与目的

开发一种基于变换器双向编码器表征(BERT)的自然语言处理(NLP)方法,使其适用于法语CT报告,并评估其在计算临床怀疑肺栓塞(PE)患者CT诊断率方面的性能。

材料与方法

对2019年(99510份报告,训练和验证数据集)及2018年(94559份报告,测试数据集)在本机构进行的所有CT报告进行匿名化处理后纳入研究。在来自训练数据集的27700个手动标注句子上训练了两个基于BERT的NLP句子分类器。第一个旨在将报告句子分为三类(“非胸部”、“健康胸部”和“病理性胸部”相关句子),第二个旨在将最后一类分为包括“肺栓塞”在内的11个子类病理。在验证数据集上报告F1分数。然后将这些NLP分类器应用于测试数据集中请求的肺栓塞CT报告。与对报告的人工分析相比,报告了检测肺栓塞存在的敏感性、特异性和准确性。

结果

3类和11子类分类器的F1分数分别为0.984和0.985。测试数据集中有4042例检查请求进行肺栓塞检查,其中641例(15.8%)经放射科医生阳性评估。在这些报告中,NLP网络识别肺栓塞的敏感性、特异性和准确性分别为98.2%、99.3%和99.1%。

结论

基于BERT的NLP句子分类器能够分析大型放射学报告数据库,以准确确定CT筛查的诊断率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc63/11265488/e6e5044d1502/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验