Department of Biomedical Informatics and Computational Biology, University of Minnesota, Minneapolis, Minnesota, United States of America.
Diagnostic Imaging, Philips Global, Maple Grove, Minnesota, United States of America.
PLoS One. 2020 Jul 1;15(7):e0214775. doi: 10.1371/journal.pone.0214775. eCollection 2020.
The manual extraction of valuable data from electronic medical records is cumbersome, error-prone, and inconsistent. By automating extraction in conjunction with standardized terminology, the quality and consistency of data utilized for research and clinical purposes would be substantially improved. Here, we set out to develop and validate a framework to extract pertinent clinical conditions for traumatic brain injury (TBI) from computed tomography (CT) reports.
We developed tbiExtractor, which extends pyConTextNLP, a regular expression algorithm using negation detection and contextual features, to create a framework for extracting TBI common data elements from radiology reports. The algorithm inputs radiology reports and outputs a structured summary containing 27 clinical findings with their respective annotations. Development and validation of the algorithm was completed using two physician annotators as the gold standard.
tbiExtractor displayed high sensitivity (0.92-0.94) and specificity (0.99) when compared to the gold standard. The algorithm also demonstrated a high equivalence (94.6%) with the annotators. A majority of clinical findings (85%) had minimal errors (F1 Score ≥ 0.80). When compared to annotators, tbiExtractor extracted information in significantly less time (0.3 sec vs 1.7 min per report).
tbiExtractor is a validated algorithm for extraction of TBI common data elements from radiology reports. This automation reduces the time spent to extract structured data and improves the consistency of data extracted. Lastly, tbiExtractor can be used to stratify subjects into groups based on visible damage by partitioning the annotations of the pertinent clinical conditions on a radiology report.
从电子病历中手动提取有价值的数据既繁琐、易错又不一致。通过与标准化术语结合自动提取,可以大大提高用于研究和临床目的的数据的质量和一致性。在这里,我们着手开发和验证一个从计算机断层扫描(CT)报告中提取创伤性脑损伤(TBI)相关临床情况的框架。
我们开发了 tbiExtractor,它扩展了 pyConTextNLP,这是一种使用否定检测和上下文特征的正则表达式算法,用于创建从放射学报告中提取 TBI 常见数据元素的框架。该算法输入放射学报告,并输出包含 27 个临床发现及其各自注释的结构化摘要。该算法的开发和验证是使用两位医师注释员作为金标准完成的。
与金标准相比,tbiExtractor 的敏感性(0.92-0.94)和特异性(0.99)均较高。该算法还与注释员具有高度一致性(94.6%)。大多数临床发现(85%)的错误很少(F1 得分≥0.80)。与注释员相比,tbiExtractor 提取信息的速度明显更快(每份报告 0.3 秒 vs 1.7 分钟)。
tbiExtractor 是一种从放射学报告中提取 TBI 常见数据元素的经过验证的算法。这种自动化减少了提取结构化数据所花费的时间,并提高了提取数据的一致性。最后,tbiExtractor 可以用于根据放射学报告上相关临床情况的注释对受试者进行分组,以区分可见损伤。