Duan Junwen, Liu Shuyue, Liao Xincheng, Gong Feng, Yue Hailin, Wang Jianxin
IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1143-1153. doi: 10.1109/TCBB.2024.3376591. Epub 2024 Oct 9.
Chinese electronic medical record (EMR) presents significant challenges for named entity recognition (NER) due to their specialized nature, unique language features, and diverse expressions. Traditionally, NER is treated as a sequence labeling task, where each token is assigned a label. Recent research has reframed NER within the machine reading comprehension (MRC) framework, extracting entities in a question-answer format, achieving state-of-the-art performance. However, these MRC-based methods have a significant limitation: they extract entities of various types independently, ignoring their interrelations. To address this, we introduce the Fusion Label Relations with MRC (FLR-MRC) model, which enhances the MRC model by implicitly capturing dependencies among entity types. FLR-MRC models interrelations between labels using graph attention networks, integrating these with textual data to identify entities. On the benchmark CMeEE and CCKS2017-CNER datasets, FLR-MRC achieves F1-scores of 0.6652 and 0.9101, respectively, outperforming existing clinical NER methods.
由于其专业性、独特的语言特征和多样的表达方式,中文电子病历(EMR)在命名实体识别(NER)方面面临重大挑战。传统上,NER被视为一个序列标注任务,其中每个词元都被分配一个标签。最近的研究在机器阅读理解(MRC)框架内对NER进行了重新构建,以问答格式提取实体,取得了最优性能。然而,这些基于MRC的方法有一个重大局限性:它们独立提取各种类型的实体,忽略了它们之间的相互关系。为了解决这个问题,我们引入了融合标签关系的MRC(FLR-MRC)模型,该模型通过隐式捕捉实体类型之间的依赖关系来增强MRC模型。FLR-MRC使用图注意力网络对标签之间的关系进行建模,将这些关系与文本数据相结合以识别实体。在基准CMeEE和CCKS2017-CNER数据集上,FLR-MRC的F1分数分别达到0.6652和0.9101,优于现有的临床NER方法。