Intelligent Systems Program, University of Pittsburgh, Pittsburgh, Pennsylvania, United States.
Department of Surgery and Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States.
Appl Clin Inform. 2019 Aug;10(4):655-669. doi: 10.1055/s-0039-1695791. Epub 2019 Sep 4.
Despite advances in natural language processing (NLP), extracting information from clinical text is expensive. Interactive tools that are capable of easing the construction, review, and revision of NLP models can reduce this cost and improve the utility of clinical reports for clinical and secondary use.
We present the design and implementation of an interactive NLP tool for identifying incidental findings in radiology reports, along with a user study evaluating the performance and usability of the tool.
Expert reviewers provided gold standard annotations for 130 patient encounters (694 reports) at sentence, section, and report levels. We performed a user study with 15 physicians to evaluate the accuracy and usability of our tool. Participants reviewed encounters split into intervention (with predictions) and control conditions (no predictions). We measured changes in model performance, the time spent, and the number of user actions needed. The System Usability Scale (SUS) and an open-ended questionnaire were used to assess usability.
Starting from bootstrapped models trained on 6 patient encounters, we observed an average increase in F1 score from 0.31 to 0.75 for reports, from 0.32 to 0.68 for sections, and from 0.22 to 0.60 for sentences on a held-out test data set, over an hour-long study session. We found that tool helped significantly reduce the time spent in reviewing encounters (134.30 vs. 148.44 seconds in intervention and control, respectively), while maintaining overall quality of labels as measured against the gold standard. The tool was well received by the study participants with a very good overall SUS score of 78.67.
The user study demonstrated successful use of the tool by physicians for identifying incidental findings. These results support the viability of adopting interactive NLP tools in clinical care settings for a wider range of clinical applications.
尽管自然语言处理(NLP)技术取得了进步,但从临床文本中提取信息的成本仍然很高。能够简化 NLP 模型构建、审核和修订的交互式工具可以降低成本,并提高临床报告在临床和二次使用中的实用性。
我们介绍了一种用于识别放射学报告中偶然发现的交互式 NLP 工具的设计和实现,以及一项评估该工具性能和可用性的用户研究。
专家评审员对 130 个患者就诊(694 份报告)进行了句子、部分和报告级别的黄金标准注释。我们进行了一项包含 15 名医生的用户研究,以评估我们工具的准确性和可用性。参与者在干预(有预测)和对照(无预测)条件下查看就诊记录。我们测量了模型性能、花费时间和所需用户操作次数的变化。系统可用性量表(SUS)和开放性问题问卷调查用于评估可用性。
从在 6 个患者就诊中进行的自举模型开始,我们在一个持续一个多小时的研究期间,在一个独立的测试数据集上观察到报告的 F1 评分从 0.31 增加到 0.75,从 0.32 增加到 0.68,从句子的 0.22 增加到 0.60,从 0.31 增加到 0.75。我们发现,该工具显著帮助减少了审阅就诊记录的时间(干预和对照条件下分别为 134.30 秒和 148.44 秒),同时保持了与黄金标准相比的整体标签质量。研究参与者对该工具的评价非常好,总体 SUS 评分为 78.67。
用户研究表明,医生成功地使用该工具来识别偶然发现。这些结果支持在更广泛的临床应用中,在临床护理环境中采用交互式 NLP 工具的可行性。