BioCreative VI 精准医学赛道概述：精准医学中的蛋白质相互作用和突变挖掘。

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada.

出版信息

Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.

DOI:10.1093/database/bay147

PMID:30689846

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6348314/

Abstract

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.

摘要

精准医学倡议是一项多中心努力，旨在利用个体患者数据（临床、基因组序列和功能基因组数据）以及大型知识库中的信息（整合基因组注释、疾病关联研究、电子健康记录和其他数据类型）制定个性化治疗方案。生物医学文献为充实这些知识库提供了丰富的基础，报告了遗传和分子相互作用，为细胞调控系统提供了框架，并详细说明了遗传变异在这些相互作用中的影响。BioCreative VI 精准医学轨道的目标是提取这种特定类型的信息，分为两个任务：（i）文档分类任务，重点是识别包含经实验验证的蛋白质-蛋白质相互作用（PPIs）的科学文献，这些相互作用受基因突变的影响；（ii）关系提取任务，重点是提取受影响的相互作用（蛋白质对）。为了协助系统开发人员和任务参与者，我们手动注释了一个大规模的 PubMed 文档语料库来完成这个任务。全球 10 个团队为文档分类任务贡献了 22 个不同的文本挖掘模型，全球 6 个团队为关系提取任务贡献了 14 个不同的文本挖掘系统。当将文本挖掘系统的预测与人工注释进行比较时，对于分类任务，最佳 F1 得分为 69.06%，最佳精度为 62.89%，最佳召回率为 98.0%，最佳平均精度为 72.5%。对于关系提取任务，当考虑同源基因时，最佳 F1 得分为 37.73%，最佳精度为 46.5%，最佳召回率为 54.1%。提交的系统探索了广泛的方法，从传统的基于规则、统计和机器学习系统到最先进的深度学习方法。考虑到参与水平和各个团队的结果，我们发现精准医学轨道成功地吸引了文本挖掘研究社区的参与。同时，该轨道生成了一个由 BioGRID 策展人开发的、与精准医学相关的 5509 篇 PubMed 文档的手动注释语料库。该数据集可供社区免费使用，并且特定的相互作用已集成到 BioGRID 数据集中。此外，该挑战提供了自动识别描述受突变影响的蛋白质-蛋白质相互作用的 PubMed 文章的首批结果，以及从这些文章中提取受影响的关系。尽管如此，要使计算机辅助精准医学文本挖掘成为主流，仍有许多工作要做。未来的工作应重点解决剩余的技术挑战，并将文本挖掘工具的实际效益纳入现实世界的精准医学信息策管工作中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd07/6348314/f37e7f40018a/bay147f1.jpg

相似文献

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述：精准医学中的蛋白质相互作用和突变挖掘。

Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.

BioCreative VI Precision Medicine Track system performance is constrained by entity recognition and variations in corpus characteristics.生物创意 VI 精准医疗轨道系统的性能受到实体识别和语料库特征变化的限制。

Database (Oxford). 2018 Jan 1;2018:bay122. doi: 10.1093/database/bay122.

Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.

The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.BioCreative III 的蛋白质-蛋白质相互作用任务：文章的分类/排序和将生物本体论概念链接到全文。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-12-S8-S3.

Overview of the gene ontology task at BioCreative IV.生物创意IV基因本体任务概述。

Database (Oxford). 2014 Aug 25;2014. doi: 10.1093/database/bau086. Print 2014.

Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge.生物学文本挖掘系统评估：第二届生物创意社区挑战赛概述

Genome Biol. 2008;9 Suppl 2(Suppl 2):S1. doi: 10.1186/gb-2008-9-s2-s1. Epub 2008 Sep 1.

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状：生物创意V化学-疾病关系（CDR）任务概述。

Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.

BioCreative III interactive task: an overview.BioCreative III 交互式任务概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.

Overview of the protein-protein interaction annotation extraction task of BioCreative II.生物创意II蛋白质-蛋白质相互作用注释提取任务概述。

Genome Biol. 2008;9 Suppl 2(Suppl 2):S4. doi: 10.1186/gb-2008-9-s2-s4. Epub 2008 Sep 1.

The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions.BioC-BioGRID语料库：为蛋白质-蛋白质和基因相互作用的编目而注释的全文文章。

Database (Oxford). 2017 Jan 10;2017. doi: 10.1093/database/baw147. Print 2017.

引用本文的文献

Do LLMs Surpass Encoders for Biomedical NER?大型语言模型在生物医学命名实体识别方面是否超越了编码器？

Proc (IEEE Int Conf Healthc Inform). 2025 Jun;2025:352-358. doi: 10.1109/ICHI64645.2025.00048. Epub 2025 Jul 22.

Role of Artificial Intelligence and Personalized Medicine in Enhancing HIV Management and Treatment Outcomes.人工智能与个性化医疗在改善艾滋病病毒管理及治疗效果中的作用

Life (Basel). 2025 May 6;15(5):745. doi: 10.3390/life15050745.

Benchmarking large language models for biomedical natural language processing applications and recommendations.用于生物医学自然语言处理应用的大型语言模型基准测试及建议。

Nat Commun. 2025 Apr 6;16(1):3280. doi: 10.1038/s41467-025-56989-2.

A review of large language models and autonomous agents in chemistry.化学领域中大型语言模型与自主智能体的综述。

Chem Sci. 2024 Dec 9;16(6):2514-2572. doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5.

WWAD: the most comprehensive small molecule World Wide Approved Drug database of therapeutics.WWAD：最全面的小分子全球获批治疗药物数据库。

Front Pharmacol. 2024 Sep 18;15:1473279. doi: 10.3389/fphar.2024.1473279. eCollection 2024.

The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.生物创意 VIII 挑战赛和研讨会的 BioRED 专题生物医学关系语料库。

Database (Oxford). 2024 Aug 9;2024. doi: 10.1093/database/baae071.

The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.生物创意 VIII 中生物医学关系提取数据集（BioRED）赛道概述。

Database (Oxford). 2024 Aug 8;2024. doi: 10.1093/database/baae069.

Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource.调查生物医学关系抽取：对当前数据集的批判性考察及新资源的提出。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae132.

A New Fuzzy-Based Classification Method for Use in Smart/Precision Medicine.一种用于智能/精准医学的基于模糊的新分类方法。

Bioengineering (Basel). 2023 Jul 15;10(7):838. doi: 10.3390/bioengineering10070838.

BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets.BioREx：利用异构数据集改进生物医学关系抽取

ArXiv. 2023 Jun 19:arXiv:2306.11189v1.

本文引用的文献

Document triage for identifying protein-protein interactions affected by mutations: a neural network ensemble approach.用于鉴定受突变影响的蛋白质-蛋白质相互作用的文档分类：一种神经网络集成方法。

Database (Oxford). 2018 Jan 1;2018:bay097. doi: 10.1093/database/bay097.

Exploiting graph kernels for high performance biomedical relation extraction.利用图核进行高性能生物医学关系提取。

J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.

On expert curation and scalability: UniProtKB/Swiss-Prot as a case study.关于专业策展和可扩展性：以 UniProtKB/Swiss-Prot 为例。

Bioinformatics. 2017 Nov 1;33(21):3454-3460. doi: 10.1093/bioinformatics/btx439.

tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.tmVar 2.0：整合文献中的基因组变异信息与 dbSNP 和 ClinVar，以用于精准医学。

Bioinformatics. 2018 Jan 1;34(1):80-87. doi: 10.1093/bioinformatics/btx541.

nala: text mining natural language mutation mentions.纳拉：文本挖掘自然语言中的突变提及。

Bioinformatics. 2017 Jun 15;33(12):1852-1858. doi: 10.1093/bioinformatics/btx083.

Database (Oxford). 2017 Jan 10;2017. doi: 10.1093/database/baw147. Print 2017.

The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.基于交互网络本体的生物医学文献中多关键词表示的复杂交互建模与挖掘

BioData Min. 2016 Dec 19;9:41. doi: 10.1186/s13040-016-0118-0. eCollection 2016.

The BioGRID interaction database: 2017 update.生物通用互作数据库：2017年更新版。

Nucleic Acids Res. 2017 Jan 4;45(D1):D369-D379. doi: 10.1093/nar/gkw1102. Epub 2016 Dec 14.

Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.从生物医学文献中挖掘基因型-表型关系以用于数据库管理和精准医学。

PLoS Comput Biol. 2016 Nov 30;12(11):e1005017. doi: 10.1371/journal.pcbi.1005017. eCollection 2016 Nov.

Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.精准医学的文本挖掘：为电子健康记录和生物医学文献构建结构以理解基因与健康。

Adv Exp Med Biol. 2016;939:139-166. doi: 10.1007/978-981-10-1503-8_7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

BioCreative VI 精准医学赛道概述：精准医学中的蛋白质相互作用和突变挖掘。

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献