Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, Hubei, China.
School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan, Hubei, China.
PLoS One. 2020 Jul 15;15(7):e0235796. doi: 10.1371/journal.pone.0235796. eCollection 2020.
Chinese information extraction is traditionally performed in the process of word segmentation, entity recognition, relation extraction and event detection. This pipelined approach suffers from two limitations: 1) It is prone to introduce propagated errors from upstream tasks to subsequent applications; 2) Mutual benefits of cross-task dependencies are hard to be introduced in non-overlapping models. To address these two challenges, we propose a novel transition-based model that jointly performs entity recognition, relation extraction and event detection as a single task. In addition, we incorporate subword-level information into character sequence with the use of a hybrid lattice structure, removing the reliance of external word tokenizers. Results on standard ACE benchmarks show the benefits of the proposed joint model and lattice network, which gives the best result in the literature.
中文信息抽取传统上是在分词、实体识别、关系抽取和事件检测的过程中完成的。这种流水线方法存在两个局限性:1)它容易将上游任务的传播错误引入到后续应用中;2)非重叠模型中很难引入跨任务依赖的相互好处。为了解决这两个挑战,我们提出了一种新的基于转移的模型,将实体识别、关系抽取和事件检测联合作为一个单一的任务。此外,我们还使用混合格结构将子词级信息纳入字符序列,从而消除了对外部单词标记器的依赖。在标准 ACE 基准测试上的结果表明了所提出的联合模型和格网络的优势,这在文献中给出了最好的结果。