IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):823-835. doi: 10.1109/TCBB.2020.2979959. Epub 2021 Jun 3.
Conditions play an essential role in biomedical statements. However, existing biomedical knowledge graphs (BioKGs) only focus on factual knowledge, organized as a flat relational network of biomedical concepts. These BioKGs ignore the conditions of the facts being valid, which loses essential contexts for knowledge exploration and inference. We consider both facts and their conditions in biomedical statements and proposed a three-layered information-lossless representation of BioKG. The first layer has biomedical concept nodes, attribute nodes. The second layer represents both biomedical fact and condition tuples by nodes of the relation phrases, connecting to the subject and object in the first layer. The third layer has nodes of statements connecting to a set of fact tuples and/or condition tuples in the second layer. We transform the BioKG construction problem into a sequence labeling problem based on a novel designed tag schema. We design a Multi-Input Multi-Output sequence labeling model (MIMO) that learns from multiple input signals and generates proper number of multiple output sequences for tuple extraction. Experiments on a newly constructed dataset show that MIMO outperforms the existing methods. Further case study demonstrates that the BioKGs constructed provide a good understanding of the biomedical statements.
条件在生物医学陈述中起着至关重要的作用。然而,现有的生物医学知识图谱(BioKG)仅关注事实知识,组织为生物医学概念的平面关系网络。这些 BioKG 忽略了事实有效条件,从而失去了知识探索和推理的重要上下文。我们在生物医学陈述中同时考虑事实及其条件,并提出了一种三层信息无损的 BioKG 表示方法。第一层具有生物医学概念节点、属性节点。第二层通过关系短语的节点表示生物医学事实和条件元组,连接到第一层的主题和对象。第三层具有语句节点,连接到第二层的一组事实元组和/或条件元组。我们将 BioKG 构建问题转化为基于新颖设计的标记方案的序列标记问题。我们设计了一种多输入多输出序列标记模型(MIMO),它从多个输入信号中学习,并为元组提取生成适当数量的多个输出序列。在新构建的数据集上的实验表明,MIMO 优于现有方法。进一步的案例研究表明,构建的 BioKG 提供了对生物医学陈述的良好理解。