文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

作者信息

Singhal Ayush, Simmons Michael, Lu Zhiyong

机构信息

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America.

出版信息

PLoS Comput Biol. 2016 Nov 30;12(11):e1005017. doi: 10.1371/journal.pcbi.1005017. eCollection 2016 Nov.


DOI:10.1371/journal.pcbi.1005017
PMID:27902695
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5130168/
Abstract

The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/c5a3434122c6/pcbi.1005017.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/3c33879b2775/pcbi.1005017.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/31872fe314c9/pcbi.1005017.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/748c2075bf44/pcbi.1005017.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/c9e4ad67f73c/pcbi.1005017.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/c5a3434122c6/pcbi.1005017.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/3c33879b2775/pcbi.1005017.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/31872fe314c9/pcbi.1005017.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/748c2075bf44/pcbi.1005017.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/c9e4ad67f73c/pcbi.1005017.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf5/5130168/c5a3434122c6/pcbi.1005017.g005.jpg

相似文献

[1]
Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

PLoS Comput Biol. 2016-11-30

[2]
Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature.

J Am Med Inform Assoc. 2016-7

[3]
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.

BMC Bioinformatics. 2015-6-6

[4]
tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.

Bioinformatics. 2018-1-1

[5]
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

Database (Oxford). 2019-1-1

[6]
Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.

Adv Exp Med Biol. 2016

[7]
DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

PLoS One. 2016-4-13

[8]
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

J Biomed Semantics. 2016-4-29

[9]
Text-mining clinically relevant cancer biomarkers for curation into the CIViC database.

Genome Med. 2019-12-3

[10]
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?

Brief Bioinform. 2008-11

引用本文的文献

[1]
Artificial intelligence in lung cancer: current applications, future perspectives, and challenges.

Front Oncol. 2024-12-23

[2]
Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD).

Gigascience. 2024-1-2

[3]
Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction.

Eur J Med Res. 2024-8-2

[4]
Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.

J Med Internet Res. 2024-5-30

[5]
CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning.

Gigascience. 2022-12-28

[6]
AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning.

Bioinformatics. 2023-5-4

[7]
Molecular and network-level mechanisms explaining individual differences in autism spectrum disorder.

Nat Neurosci. 2023-4

[8]
Chemical-protein relation extraction with ensembles of carefully tuned pretrained language models.

Database (Oxford). 2022-11-18

[9]
Computational Analyses Reveal Fundamental Properties of the Hemophilia Literature in the Last 6 Decades.

Bioinform Biol Insights. 2022-9-22

[10]
A Web Application for Biomedical Text Mining of Scientific Literature Associated with Coronavirus-Related Syndromes: Coronavirus Finder.

Diagnostics (Basel). 2022-4-2

本文引用的文献

[1]
Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature.

J Am Med Inform Assoc. 2016-7

[2]
BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations.

Database (Oxford). 2016-4-13

[3]
Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts.

BMC Bioinformatics. 2016-1-8

[4]
GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.

Biomed Res Int. 2015

[5]
SimConcept: a hybrid approach for simplifying composite named entities in biomedical text.

IEEE J Biomed Health Inform. 2015-7

[6]
Biological databases for human research.

Genomics Proteomics Bioinformatics. 2015-2

[7]
A new initiative on precision medicine.

N Engl J Med. 2015-2-26

[8]
OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders.

Nucleic Acids Res. 2015-1

[9]
Associating disease-related genetic variants in intergenic regions to the genes they impact.

PeerJ. 2014-10-23

[10]
UniProt: a hub for protein information.

Nucleic Acids Res. 2015-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索