文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

BioCreative VI 精准医学赛道概述:精准医学中的蛋白质相互作用和突变挖掘。

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada.

出版信息

Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.


DOI:10.1093/database/bay147
PMID:30689846
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6348314/
Abstract

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.

摘要

精准医学倡议是一项多中心努力,旨在利用个体患者数据(临床、基因组序列和功能基因组数据)以及大型知识库中的信息(整合基因组注释、疾病关联研究、电子健康记录和其他数据类型)制定个性化治疗方案。生物医学文献为充实这些知识库提供了丰富的基础,报告了遗传和分子相互作用,为细胞调控系统提供了框架,并详细说明了遗传变异在这些相互作用中的影响。BioCreative VI 精准医学轨道的目标是提取这种特定类型的信息,分为两个任务:(i)文档分类任务,重点是识别包含经实验验证的蛋白质-蛋白质相互作用(PPIs)的科学文献,这些相互作用受基因突变的影响;(ii)关系提取任务,重点是提取受影响的相互作用(蛋白质对)。为了协助系统开发人员和任务参与者,我们手动注释了一个大规模的 PubMed 文档语料库来完成这个任务。全球 10 个团队为文档分类任务贡献了 22 个不同的文本挖掘模型,全球 6 个团队为关系提取任务贡献了 14 个不同的文本挖掘系统。当将文本挖掘系统的预测与人工注释进行比较时,对于分类任务,最佳 F1 得分为 69.06%,最佳精度为 62.89%,最佳召回率为 98.0%,最佳平均精度为 72.5%。对于关系提取任务,当考虑同源基因时,最佳 F1 得分为 37.73%,最佳精度为 46.5%,最佳召回率为 54.1%。提交的系统探索了广泛的方法,从传统的基于规则、统计和机器学习系统到最先进的深度学习方法。考虑到参与水平和各个团队的结果,我们发现精准医学轨道成功地吸引了文本挖掘研究社区的参与。同时,该轨道生成了一个由 BioGRID 策展人开发的、与精准医学相关的 5509 篇 PubMed 文档的手动注释语料库。该数据集可供社区免费使用,并且特定的相互作用已集成到 BioGRID 数据集中。此外,该挑战提供了自动识别描述受突变影响的蛋白质-蛋白质相互作用的 PubMed 文章的首批结果,以及从这些文章中提取受影响的关系。尽管如此,要使计算机辅助精准医学文本挖掘成为主流,仍有许多工作要做。未来的工作应重点解决剩余的技术挑战,并将文本挖掘工具的实际效益纳入现实世界的精准医学信息策管工作中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd07/6348314/f37e7f40018a/bay147f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd07/6348314/f37e7f40018a/bay147f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd07/6348314/f37e7f40018a/bay147f1.jpg

相似文献

[1]
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

Database (Oxford). 2019-1-1

[2]
BioCreative VI Precision Medicine Track system performance is constrained by entity recognition and variations in corpus characteristics.

Database (Oxford). 2018-1-1

[3]
Overview of the BioCreative III Workshop.

BMC Bioinformatics. 2011-10-3

[4]
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.

BMC Bioinformatics. 2011-10-3

[5]
Overview of the gene ontology task at BioCreative IV.

Database (Oxford). 2014-8-25

[6]
Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge.

Genome Biol. 2008

[7]
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.

Database (Oxford). 2016-3-19

[8]
BioCreative III interactive task: an overview.

BMC Bioinformatics. 2011-10-3

[9]
Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Genome Biol. 2008

[10]
The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions.

Database (Oxford). 2017-1-10

引用本文的文献

[1]
Do LLMs Surpass Encoders for Biomedical NER?

Proc (IEEE Int Conf Healthc Inform). 2025-6

[2]
Role of Artificial Intelligence and Personalized Medicine in Enhancing HIV Management and Treatment Outcomes.

Life (Basel). 2025-5-6

[3]
Benchmarking large language models for biomedical natural language processing applications and recommendations.

Nat Commun. 2025-4-6

[4]
A review of large language models and autonomous agents in chemistry.

Chem Sci. 2024-12-9

[5]
WWAD: the most comprehensive small molecule World Wide Approved Drug database of therapeutics.

Front Pharmacol. 2024-9-18

[6]
The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.

Database (Oxford). 2024-8-9

[7]
The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.

Database (Oxford). 2024-8-8

[8]
Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource.

Brief Bioinform. 2024-3-27

[9]
A New Fuzzy-Based Classification Method for Use in Smart/Precision Medicine.

Bioengineering (Basel). 2023-7-15

[10]
BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets.

ArXiv. 2023-6-19

本文引用的文献

[1]
Document triage for identifying protein-protein interactions affected by mutations: a neural network ensemble approach.

Database (Oxford). 2018-1-1

[2]
Exploiting graph kernels for high performance biomedical relation extraction.

J Biomed Semantics. 2018-1-30

[3]
On expert curation and scalability: UniProtKB/Swiss-Prot as a case study.

Bioinformatics. 2017-11-1

[4]
tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.

Bioinformatics. 2018-1-1

[5]
nala: text mining natural language mutation mentions.

Bioinformatics. 2017-6-15

[6]
The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions.

Database (Oxford). 2017-1-10

[7]
The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.

BioData Min. 2016-12-19

[8]
The BioGRID interaction database: 2017 update.

Nucleic Acids Res. 2017-1-4

[9]
Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

PLoS Comput Biol. 2016-11-30

[10]
Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.

Adv Exp Med Biol. 2016

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索