Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.
Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA.
J Am Med Inform Assoc. 2018 Jan 1;25(1):81-87. doi: 10.1093/jamia/ocx070.
The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe a prototype tool for interactive review and revision of natural language processing models of binary concepts extracted from clinical notes. We evaluated our prototype in a user study involving 9 physicians, who used our tool to build and revise models for 2 colonoscopy quality variables. We report changes in performance relative to the quantity of feedback. Using initial training sets as small as 10 documents, expert review led to final F1scores for the "appendiceal-orifice" variable between 0.78 and 0.91 (with improvements ranging from 13.26% to 29.90%). F1for "biopsy" ranged between 0.88 and 0.94 (-1.52% to 11.74% improvements). The average System Usability Scale score was 70.56. Subjective feedback also suggests possible design improvements.
领域专家和自然语言处理专业知识之间的差距是从临床文本中提取理解的障碍。我们描述了一种用于从临床记录中提取的二进制概念的自然语言处理模型的交互式审查和修订的原型工具。我们在一项涉及 9 名医生的用户研究中评估了我们的原型,这些医生使用我们的工具为 2 个结肠镜检查质量变量构建和修改模型。我们报告了相对于反馈数量的性能变化。使用最初的训练集小至 10 个文档,专家审查导致“阑尾口”变量的最终 F1 分数在 0.78 和 0.91 之间(改进幅度在 13.26%到 29.90%之间)。“活检”的 F1 分数在 0.88 和 0.94 之间(-1.52%到 11.74%的改进)。平均系统可用性量表评分为 70.56。主观反馈还表明可能需要进行设计改进。