Division of Endocrine Surgery at University of Wisconsin School of Medicine and Public Health, Department of Surgery, Madison, Wisconsin.
Division of Endocrine Surgery at University of Wisconsin School of Medicine and Public Health, Department of Surgery, Madison, Wisconsin.
J Surg Res. 2020 Dec;256:557-563. doi: 10.1016/j.jss.2020.07.015. Epub 2020 Aug 13.
Critical thyroid nodule features are contained in unstructured ultrasound (US) reports. The Thyroid Imaging, Reporting, and Data System (TI-RADS) uses five key features to risk stratify nodules and recommend appropriate intervention. This study aims to analyze the quality of US reporting and the potential benefit of Natural Language Processing (NLP) systems in efficiently capturing TI-RADS features from text reports.
This retrospective study used free-text thyroid US reports from an academic center (A) and community hospital (B). Physicians created "gold standard" annotations by manually extracting TI-RADS features and clinical recommendations from reports to determine how often they were included. Similar annotations were created using an automated NLP system and compared with the gold standard.
Two hundred eighty-two reports contained 409 nodules at least 1-cm in maximum diameter. The gold standard identified three nodules (0.7%) which contained enough information to calculate a complete TI-RADS score. Shape was described most often (92.7% of nodules), whereas margins were described least often (11%). A median number of two TI-RADS features are reported per nodule. The NLP system was significantly less accurate than the gold standard in capturing echogenicity (27.5%) and margins (58.9%). One hundred eight nodule reports (26.4%) included clinical management recommendations, which were included more often at site A than B (33.9 versus 17%, P < 0.05).
These results suggest a gap between current US reporting styles and those needed to implement TI-RADS and achieve NLP accuracy. Synoptic reporting should prompt more complete thyroid US reporting, improved recommendations for intervention, and better NLP performance.
关键的甲状腺结节特征包含在非结构化的超声(US)报告中。甲状腺影像报告和数据系统(TI-RADS)使用五个关键特征来对结节进行风险分层,并推荐适当的干预措施。本研究旨在分析 US 报告的质量,以及自然语言处理(NLP)系统在从文本报告中高效捕捉 TI-RADS 特征方面的潜在益处。
这项回顾性研究使用了来自学术中心(A)和社区医院(B)的免费甲状腺 US 报告。医生通过手动从报告中提取 TI-RADS 特征和临床建议来创建“黄金标准”注释,以确定它们被包含的频率。使用自动 NLP 系统创建了类似的注释,并与黄金标准进行了比较。
282 份报告共包含 409 个至少 1 厘米最大直径的结节。黄金标准确定了 3 个含有足够信息来计算完整 TI-RADS 评分的结节。形状的描述最常见(92.7%的结节),而边界的描述最不常见(11%)。每个结节报告的 TI-RADS 特征中位数为 2 个。NLP 系统在捕捉回声特征(27.5%)和边界(58.9%)方面明显不如黄金标准准确。108 份结节报告(26.4%)包含临床管理建议,这些建议在 A 点比 B 点更常见(33.9%比 17%,P<0.05)。
这些结果表明,当前的 US 报告风格与实施 TI-RADS 和实现 NLP 准确性所需的风格之间存在差距。综合报告应该促使更完整的甲状腺 US 报告、改善干预建议,并提高 NLP 性能。