Miller Timothy, Dligach Dmitriy, Bethard Steven, Lin Chen, Savova Guergana
Boston Children's Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
Loyola University Chicago, Chicago, IL, United States.
J Biomed Inform. 2017 May;69:251-258. doi: 10.1016/j.jbi.2017.04.015. Epub 2017 Apr 21.
This work investigates the problem of clinical coreference resolution in a model that explicitly tracks entities, and aims to measure the performance of that model in both traditional in-domain train/test splits and cross-domain experiments that measure the generalizability of learned models.
The two methods we compare are a baseline mention-pair coreference system that operates over pairs of mentions with best-first conflict resolution and a mention-synchronous system that incrementally builds coreference chains. We develop new features that incorporate distributional semantics, discourse features, and entity attributes. We use two new coreference datasets with similar annotation guidelines - the THYME colon cancer dataset and the DeepPhe breast cancer dataset.
The mention-synchronous system performs similarly on in-domain data but performs much better on new data. Part of speech tag features prove superior in feature generalizability experiments over other word representations. Our methods show generalization improvement but there is still a performance gap when testing in new domains.
Generalizability of clinical NLP systems is important and under-studied, so future work should attempt to perform cross-domain and cross-institution evaluations and explicitly develop features and training regimens that favor generalizability. A performance-optimized version of the mention-synchronous system will be included in the open source Apache cTAKES software.
本研究探讨了在一个明确跟踪实体的模型中临床共指消解的问题,旨在衡量该模型在传统的领域内训练/测试分割以及衡量学习模型泛化能力的跨域实验中的性能。
我们比较的两种方法,一种是基线提及对共指系统,它通过具有最佳优先冲突消解的提及对进行操作;另一种是提及同步系统,它逐步构建共指链。我们开发了结合分布语义、篇章特征和实体属性的新特征。我们使用了两个具有相似注释指南的新共指数据集——THYME结肠癌数据集和DeepPhe乳腺癌数据集。
提及同步系统在领域内数据上表现相似,但在新数据上表现要好得多。词性标注特征在特征泛化实验中被证明优于其他词表示。我们的方法显示出泛化能力有所提高,但在新领域进行测试时仍存在性能差距。
临床自然语言处理系统的泛化能力很重要且研究不足,因此未来的工作应尝试进行跨域和跨机构评估,并明确开发有利于泛化的特征和训练方案。提及同步系统的性能优化版本将包含在开源的Apache cTAKES软件中。