School of Information Science and Engineering, Hebei University of Science and Technology, Hebei 050018, China.
FedUni Information Engineering Institute, Hebei University of Science and Technology, China.
J Biomed Inform. 2024 Apr;152:104624. doi: 10.1016/j.jbi.2024.104624. Epub 2024 Mar 11.
The relational triple extraction of unstructured medical texts about Parkinson's disease is critical for the construction of a medical knowledge graph. However, the triple entities in Parkinson's disease are usually complicated and overlapped, which impedes the accuracy of triple extraction, especially in the case of rarely available corpus. Therefore, this study first builds a corpus about Parkinson's disease. Then, a tagging-based three-stage relational triple extraction model is proposed, named ParTRE. To enhance the contextual representation of sentences, the proposed model employs BiLSTM modules to capture fine-grained semantic information. Additionally, a conditional normalization layer is used so that entity pairs can be extracted accurately from two complementary directions. As for the imbalanced relationship categories, an adaptive loss function strategy based on focal loss is derived by assigning different weights to relationship categories and reducing the loss of easy-to-classify samples. The model performance is evaluated on the Parkinson's corpus and public datasets. The results indicate that the proposed model achieves an overall F1-score of 93.3 % on the Parkinson's corpus and comparable performance on public datasets compared with the state-of-the-art methods. Moreover, a satisfactory result is achieved by the proposed model on conquering the overlapped entities and imbalanced relationship categories. Owing to demonstrated availability and validity, the proposed method can be integrated with medical knowledge graphs and therefore benefits medical intelligence.
针对帕金森病非结构化医学文本的关系三元组抽取对于构建医学知识图谱至关重要。然而,帕金森病中的三元组实体通常较为复杂且存在重叠,这会影响三元组抽取的准确性,尤其是在语料库稀缺的情况下。因此,本研究首先构建了一个关于帕金森病的语料库。然后,提出了一种基于标注的三阶段关系三元组抽取模型,命名为 ParTRE。为了增强句子的上下文表示,所提出的模型采用 BiLSTM 模块来捕捉细粒度的语义信息。此外,还使用条件归一化层,以便可以从两个互补的方向准确地提取实体对。对于不平衡的关系类别,通过为关系类别分配不同的权重并减少易分类样本的损失,推导出了一种基于焦点损失的自适应损失函数策略。在帕金森语料库和公共数据集上评估了模型性能。结果表明,所提出的模型在帕金森语料库上的总体 F1 得分为 93.3%,在公共数据集上的性能与最先进的方法相当。此外,所提出的模型在克服重叠实体和不平衡关系类别方面取得了令人满意的结果。由于其可用性和有效性得到了证明,因此所提出的方法可以与医学知识图谱集成,从而有益于医学智能。