IMADIS, 48 rue quivogne, 63002, Lyon, France.
University of Bordeaux, 33000, Bordeaux, France.
J Digit Imaging. 2022 Aug;35(4):993-1007. doi: 10.1007/s10278-022-00619-6. Epub 2022 Mar 22.
Although using standardized reports is encouraged, most emergency radiological reports in France remain in free-text format that can be mined with natural language processing for epidemiological purposes, activity monitoring or data collection. These reports are obtained under various on-call conditions by radiologists with various backgrounds. Our aim was to investigate what influences the radiologists' written expressions. To do so, this retrospective multicentric study included 30,227 emergency radiological reports of computed tomography scans and magnetic resonance imaging involving exactly one body region, only with pathological findings, interpreted from 2019-09-01 to 2020-02-28 by 165 radiologists. After text pre-processing, one-word tokenization and use of dictionaries for stop words, polarity, sentiment and uncertainty, 11 variables depicting the structure and content of words and sentences in the reports were extracted and summarized to 3 principal components capturing 93.7% of the dataset variance. In multivariate analysis, the 1 principal component summarized the length and lexical diversity of the reports and was significantly influenced by the weekday, time slot, workload, number of examinations previously interpreted by the radiologist during the on-call period, type of examination, emergency level and radiologists' gender (P value range: < 0.0001-0.0029). The 2 principal component summarized negative formulations, polarity and sentence length and was correlated with the number of examination previously interpreted by the radiologist, type of examination, emergency level, imaging modality and radiologists' experience (P value range: < 0.0001-0.0032). The last principal component summarized questioning, uncertainty and polarity and was correlated with the type of examination and emergency level (all P values < 0.0001). Thus, the length, structure and content of emergency radiological reports were significantly influenced by organizational, radiologist- and examination-related characteristics, highlighting the subjectivity and variability in the way radiologists express themselves during their clinical activity. These findings advocate for more homogeneous practices in radiological reporting and stress the need to consider these influential features when developing models based on natural language processing.
尽管鼓励使用标准化报告,但法国大多数急诊放射学报告仍采用自由文本格式,可以通过自然语言处理对其进行挖掘,以用于流行病学目的、活动监测或数据收集。这些报告是由具有不同背景的放射科医生在各种值班条件下获得的。我们的目的是研究影响放射科医生书面表达的因素。为此,这项回顾性多中心研究纳入了 2019 年 9 月 1 日至 2020 年 2 月 28 日期间,由 165 名放射科医生解读的 30227 份涉及一个确切身体部位、仅具有病理发现的计算机断层扫描和磁共振成像的急诊放射学报告。在文本预处理、单字标记和使用字典去除停用词、极性、情感和不确定性后,从报告的结构和内容中提取并总结了 11 个变量,这些变量总结为 3 个主成分,捕获了数据集方差的 93.7%。在多变量分析中,第一主成分总结了报告的长度和词汇多样性,并且受到工作日、时段、工作量、值班期间放射科医生之前解读的检查次数、检查类型、紧急程度和放射科医生性别的显著影响(P 值范围:<0.0001-0.0029)。第二主成分总结了否定形式、极性和句子长度,与放射科医生之前解读的检查次数、检查类型、紧急程度、成像方式和放射科医生经验相关(P 值范围:<0.0001-0.0032)。最后一个主成分总结了疑问、不确定性和极性,与检查类型和紧急程度相关(所有 P 值均<0.0001)。因此,急诊放射学报告的长度、结构和内容受到组织、放射科医生和检查相关特征的显著影响,突出了放射科医生在临床活动中表达自己的方式的主观性和可变性。这些发现主张在放射学报告中采用更加统一的做法,并强调在开发基于自然语言处理的模型时需要考虑这些有影响的特征。