Suppr超能文献

生物医学文献挖掘:基于图核的基因-基因相互作用提取的学习方法。

Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction.

机构信息

Department of Statistics, Tamkang University, Tamsui District, New Taipei City, 251301, Taiwan.

出版信息

Eur J Med Res. 2024 Aug 2;29(1):404. doi: 10.1186/s40001-024-01983-5.

Abstract

The supervised machine learning method is often used for biomedical relationship extraction. The disadvantage is that it requires much time and money to manually establish an annotated dataset. Based on distant supervision, the knowledge base is combined with the corpus, thus, the training corpus can be automatically annotated. As many biomedical databases provide knowledge bases for study with a limited number of annotated corpora, this method is practical in biomedicine. The clinical significance of each patient's genetic makeup can be understood based on the healthcare provider's genetic database. Unfortunately, the lack of previous biomedical relationship extraction studies focuses on gene-gene interaction. The main purpose of this study is to develop extraction methods for gene-gene interactions that can help explain the heritability of human complex diseases. This study referred to the information on gene-gene interactions in the KEGG PATHWAY database, the abstracts in PubMed were adopted to generate the training sample set, and the graph kernel method was adopted to extract gene-gene interactions. The best assessment result was an F1-score of 0.79. Our developed distant supervision method automatically finds sentences through the corpus without manual labeling for extracting gene-gene interactions, which can effectively reduce the time cost for manual annotation data; moreover, the relationship extraction method based on a graph kernel can be successfully applied to extract gene-gene interactions. In this way, the results of this study are expected to help achieve precision medicine.

摘要

监督机器学习方法常用于生物医学关系提取。缺点是手动建立标注数据集需要大量的时间和金钱。基于远程监督,知识库与语料库相结合,从而可以自动标注训练语料库。由于许多生物医学数据库提供了带有有限数量标注语料库的知识库供研究使用,因此该方法在生物医学中非常实用。可以根据医疗保健提供者的基因数据库了解每个患者基因构成的临床意义。不幸的是,以前的生物医学关系提取研究缺乏对基因-基因相互作用的关注。本研究的主要目的是开发有助于解释人类复杂疾病遗传性的基因-基因相互作用提取方法。本研究参考了 KEGG PATHWAY 数据库中的基因-基因相互作用信息,采用 PubMed 中的摘要生成训练样本集,并采用图核方法提取基因-基因相互作用。最佳评估结果为 F1 得分为 0.79。我们开发的远程监督方法通过语料库自动找到句子,无需手动标记提取基因-基因相互作用,这可以有效地减少手动注释数据的时间成本;此外,基于图核的关系提取方法可以成功地应用于提取基因-基因相互作用。这样,本研究的结果有望有助于实现精准医学。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/723d/11297645/7575366b31eb/40001_2024_1983_Figa_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验