Torii Manabu, Yang Elly W, Doan Son
Medical Informatics, Kaiser Permanente Southern California, San Diego, CA.
AMIA Annu Symp Proc. 2018 Dec 5;2018:1028-1035. eCollection 2018.
Concept detection is an integral step in natural language processing (NLP) applications in the clinical domain. Clinical concepts are detailed (e.g., "pain in left/right upper/lower arm/leg") and expressed in diverse phrase types (e.g., noun, verb, adjective, or prepositional phrase). There are rich terminological resources in the clinical domain that include many concept synonyms. Even with these resources, concept detection remains challenging due to discontinuous and/or permuted phrase occurrences. To overcome this challenge, we investigated an approach to exploiting syntactic information. Syntactic patterns of concept phrases were mined from continuous, non-permuted forms of synonyms, and these patterns were used to detect discontinuous and/or permuted concept phrases. Experiments on 790 de-identified clinical notes showed that the proposed approach can potentially boost a recall of concept detection. Meanwhile, challenges and limitations were noticed. In this paper, we report and discuss our preliminary analysis and finding.
概念检测是临床领域自然语言处理(NLP)应用中不可或缺的一步。临床概念详细具体(例如,“左/右上/下臂/腿部疼痛”),并以多种短语类型(如名词、动词、形容词或介词短语)表达。临床领域有丰富的术语资源,其中包含许多概念同义词。即便有这些资源,由于短语出现的不连续性和/或排列顺序的变化,概念检测仍然具有挑战性。为克服这一挑战,我们研究了一种利用句法信息的方法。从同义词的连续、未排列形式中挖掘概念短语的句法模式,并将这些模式用于检测不连续和/或排列的概念短语。对790份去标识化临床记录进行的实验表明,所提出的方法有可能提高概念检测的召回率。同时,也注意到了挑战和局限性。在本文中,我们报告并讨论了初步分析和发现。