Laparra Egoitz, Bethard Steven, Miller Timothy A
School of Information, University of Arizona, Tucson, Arizona, USA.
Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.
JAMIA Open. 2020 Apr 13;3(2):146-150. doi: 10.1093/jamiaopen/ooaa010. eCollection 2020 Jul.
Building clinical natural language processing (NLP) systems that work on widely varying data is an absolute necessity because of the expense of obtaining new training data. While domain adaptation research can have a positive impact on this problem, the most widely studied paradigms do not take into account the realities of clinical data sharing. To address this issue, we lay out a taxonomy of domain adaptation, parameterizing by what data is shareable. We show that the most realistic settings for clinical use cases are seriously under-studied. To support research in these important directions, we make a series of recommendations, not just for domain adaptation but for clinical NLP in general, that ensure that data, shared tasks, and released models are broadly useful, and that initiate research directions where the clinical NLP community can lead the broader NLP and machine learning fields.
由于获取新训练数据的成本高昂,构建适用于广泛不同数据的临床自然语言处理(NLP)系统是绝对必要的。虽然领域适应研究可能会对这个问题产生积极影响,但研究最广泛的范式并未考虑临床数据共享的实际情况。为了解决这个问题,我们提出了一种领域适应的分类法,根据可共享的数据进行参数化。我们表明,临床用例最现实的设置尚未得到充分研究。为了支持这些重要方向的研究,我们提出了一系列建议,不仅适用于领域适应,也适用于一般的临床NLP,以确保数据、共享任务和发布的模型具有广泛的实用性,并开启临床NLP社区能够引领更广泛的NLP和机器学习领域的研究方向。