Howell Lewis, Zarei Amir, Wah Tze Min, Chandler James H, Karthik Shishir, Court Zara, Ng Helen, McLaughlan James R
School of Computing, University of Leeds, Leeds, LS2 9JT, UK.
School of Electronic and Electrical Engineering, University of Leeds, Leeds, LS2 9JT, UK.
Eur Radiol. 2025 Feb 13. doi: 10.1007/s00330-025-11416-4.
Radiology reports contain valuable information for research and audits, but relevant details are often buried within free-text fields. This makes them challenging and time-consuming to extract for secondary analyses, including training artificial intelligence (AI) models.
This study presents a rule-based RAdiology Data EXtraction tool (RADEX) to enable biomedical researchers and healthcare professionals to automate information extraction from clinical documents. RADEX simplifies the translation of domain expertise into regular-expression models, enabling context-dependent searching without specialist expertise in Natural Language Processing. Its utility was demonstrated in the multi-label classification of fourteen clinical features in a large retrospective dataset (n = 16,246) of thyroid ultrasound reports from five hospitals in the United Kingdom (UK). A tuning subset (n = 200) was used to iteratively develop the search strategy, and a holdout test subset (n = 202) was used to evaluate the performance against reference-standard labels.
The dataset cardinality was 3.06, and the label density was 0.34. Cohen's Kappa was 0.94 for rater 1 and 0.95 for rater 2. For RADEX, micro-average sensitivity, specificity, and F1-score were 0.97, 0.96, and 0.94, respectively. The processing time was 12.3 milliseconds per report, enabling fast and reliable information extraction.
RADEX is a versatile tool for bespoke research and audit applications, where access to labelled data or computing infrastructure is limited, or explainability and reproducibility are priorities. This offers a time-saving and freely available option to accelerate structured data collection, enabling new insights and improved patient care.
Question Radiology reports contain vital information that is buried in unstructured free-text fields. Can we extract this information effectively for research and audit applications? Findings A rule-based RAdiology Data Extraction tool (RADEX) is described and used to classify fourteen key findings from thyroid ultrasound reports with sensitivity and specificity > 0.95. Clinical relevance RADEX offers clinicians and researchers a time-saving tool to accelerate structured data collection. This practical approach prioritises transparency, repeatability, and usability, enabling new insights into improved patient care.
放射学报告包含对研究和审计有价值的信息,但相关细节往往隐藏在自由文本字段中。这使得对其进行二次分析(包括训练人工智能模型)具有挑战性且耗时。
本研究提出了一种基于规则的放射学数据提取工具(RADEX),以使生物医学研究人员和医疗保健专业人员能够自动从临床文档中提取信息。RADEX简化了将领域专业知识转化为正则表达式模型的过程,无需自然语言处理方面的专业知识即可进行上下文相关搜索。其效用在来自英国五家医院的大型回顾性甲状腺超声报告数据集(n = 16,246)中对14种临床特征的多标签分类中得到了证明。一个调谐子集(n = 200)用于迭代开发搜索策略,一个保留测试子集(n = 202)用于根据参考标准标签评估性能。
数据集基数为3.06,标签密度为0.34。评分者1的Cohen's Kappa为0.94,评分者2的为0.95。对于RADEX,微观平均灵敏度、特异性和F1分数分别为0.97、0.96和0.94。处理时间为每份报告12.3毫秒,能够实现快速可靠的信息提取。
RADEX是一种适用于定制研究和审计应用的通用工具,适用于标记数据或计算基础设施有限,或可解释性和可重复性为优先考虑的情况。这提供了一种节省时间且免费的选项来加速结构化数据收集,从而获得新的见解并改善患者护理。
问题 放射学报告包含埋藏在非结构化自由文本字段中的重要信息。我们能否有效地提取这些信息用于研究和审计应用? 发现 描述了一种基于规则的放射学数据提取工具(RADEX),并用于对甲状腺超声报告中的14项关键发现进行分类,灵敏度和特异性均> 0.95。 临床相关性 RADEX为临床医生和研究人员提供了一种节省时间的工具来加速结构化数据收集。这种实用方法优先考虑透明度、可重复性和可用性,从而能够对改善患者护理获得新的见解。