用于整理的基因变异文献挖掘：量化补充材料的重要性。

Literature mining of genetic variants for curation: quantifying the importance of supplementary material.

作者信息

Jimeno Yepes Antonio, Verspoor Karin

机构信息

National ICT Australia, Victoria Research Laboratory, Melbourne, Australia and Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.

出版信息

Database (Oxford). 2014 Feb 10;2014:bau003. doi: 10.1093/database/bau003. Print 2014.

DOI:10.1093/database/bau003

PMID:24520105

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3920087/

Abstract

A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains 'all of the information', and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication.

摘要

现代生物学研究的一个主要重点是理解基因组变异与疾病之间的关系。尽管目前正在做出巨大努力，试图在经过整理的资源中体现这种理解，但许多信息仍被锁定在非结构化来源中，尤其是科学文献。因此，已经开发了几种文本挖掘系统，旨在从文献中提取突变和其他基因变异。我们首次开展了一项关于利用文本挖掘从文献中直接恢复经过整理的基因变异的研究。我们考虑了两个经过整理的数据库，即COSMIC（癌症体细胞突变目录）和InSiGHT（国际胃肠道遗传性肿瘤协会），它们包含了每个收录突变与来源文献的明确链接。我们的分析表明，尽管文本挖掘工具性能良好，甚至在相关文章的全文可供处理的情况下，使用该工具召回数据库中编目的突变的召回率仍然很低。我们证明，这种差异可以通过考虑与已发表文章相关的补充材料来解释，而文本挖掘工具此前并未考虑这些材料。虽然人们凭经验知道补充材料包含“所有信息”，并且一些研究人员也推测过补充材料的作用（申克等人。从公共文献中提取与癌症相关的基因突变。《健康医学信息杂志》2012年；S2:2。），但我们的分析证实了这些材料至关重要的程度。我们的结果凸显了文献挖掘工具不仅要考虑出版物的叙述内容，还要考虑与出版物相关的全套材料的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5d1/3920087/e91cfa9a240d/bau003f1p.jpg

相似文献

Literature mining of genetic variants for curation: quantifying the importance of supplementary material.用于整理的基因变异文献挖掘：量化补充材料的重要性。

Database (Oxford). 2014 Feb 10;2014:bau003. doi: 10.1093/database/bau003. Print 2014.

Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.文本挖掘有助于数据库管理——从生物医学文献中提取突变与疾病的关联。

BMC Bioinformatics. 2015 Jun 6;16:185. doi: 10.1186/s12859-015-0609-x.

Mutation extraction tools can be combined for robust recognition of genetic variants in the literature.突变提取工具可以组合起来，以便在文献中对基因变异进行可靠识别。

F1000Res. 2014 Jan 21;3:18. doi: 10.12688/f1000research.3-18.v2. eCollection 2014.

Analyzing the Information Content of Text-Based Files in Supplementary Materials of Biomedical Literature.分析生物医学文献补充材料中基于文本文件的信息含量。

Stud Health Technol Inform. 2022 May 25;294:876-877. doi: 10.3233/SHTI220614.

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.miRiaD：一种用于检测微小RNA与疾病关联的文本挖掘工具。

J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.

Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.从生物医学文献中挖掘基因型-表型关系以用于数据库管理和精准医学。

PLoS Comput Biol. 2016 Nov 30;12(11):e1005017. doi: 10.1371/journal.pcbi.1005017. eCollection 2016 Nov.

tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.tmVar 2.0：整合文献中的基因组变异信息与 dbSNP 和 ClinVar，以用于精准医学。

Bioinformatics. 2018 Jan 1;34(1):80-87. doi: 10.1093/bioinformatics/btx541.

Annotating the biomedical literature for the human variome.注释人类变异组的生物医学文献。

Database (Oxford). 2013 Apr 12;2013:bat019. doi: 10.1093/database/bat019. Print 2013.

Text mining in livestock animal science: introducing the potential of text mining to animal sciences.文本挖掘在畜牧动物科学中的应用：介绍文本挖掘在动物科学中的应用潜力。

J Anim Sci. 2012 Oct;90(10):3666-76. doi: 10.2527/jas.2011-4841. Epub 2012 Jun 4.

Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.从文本和大规模数据分析中提取基因与疾病之间的关系：对转化研究的启示。

BMC Bioinformatics. 2015 Feb 21;16:55. doi: 10.1186/s12859-015-0472-9.

引用本文的文献

Unlocking the potential of PubMed Central supplementary data files.挖掘PubMed Central补充数据文件的潜力。

Bioinform Adv. 2025 Jun 27;5(1):vbaf155. doi: 10.1093/bioadv/vbaf155. eCollection 2025.

Automatic detection and extraction of key resources from tables in biomedical papers.从生物医学论文表格中自动检测和提取关键资源

BioData Min. 2025 Mar 20;18(1):23. doi: 10.1186/s13040-025-00438-9.

Tracking genetic variants in the biomedical literature using LitVar 2.0.使用LitVar 2.0在生物医学文献中追踪基因变异。

Nat Genet. 2023 Jun;55(6):901-903. doi: 10.1038/s41588-023-01414-x.

Assessing the use of supplementary materials to improve genomic variant discovery.评估使用补充材料来提高基因组变异发现的效果。

Database (Oxford). 2023 Mar 31;2023. doi: 10.1093/database/baad017.

Variant curation and interpretation in hereditary cancer genes: An institutional experience in Latin America.遗传性癌症基因中的变异管理和解释：拉丁美洲的机构经验。

Mol Genet Genomic Med. 2023 May;11(5):e2141. doi: 10.1002/mgg3.2141. Epub 2023 Mar 10.

Variomes: a high recall search engine to support the curation of genomic variants.变体型数据库：一个高召回率的搜索引擎，用于支持基因组变异的管理。

Bioinformatics. 2022 Apr 28;38(9):2595-2601. doi: 10.1093/bioinformatics/btac146.

MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature.MAGPEL：从全文献中自动推断变异驱动的基因面板的自动化管道。

Sci Rep. 2020 Jul 23;10(1):12365. doi: 10.1038/s41598-020-68649-0.

Global Text Mining and Development of Pharmacogenomic Knowledge Resource for Precision Medicine.用于精准医学的全球文本挖掘与药物基因组学知识资源开发。

Front Pharmacol. 2019 Aug 7;10:839. doi: 10.3389/fphar.2019.00839. eCollection 2019.

PubTator central: automated concept annotation for biomedical full text articles.PubTator 中心：用于生物医学全文文章的自动概念标注。

Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593. doi: 10.1093/nar/gkz389.

The value of universally available raw NMR data for transparency, reproducibility, and integrity in natural product research.普遍获得的原始 NMR 数据在天然产物研究中的透明度、可重复性和完整性的价值。

Nat Prod Rep. 2019 Jan 1;36(1):35-107. doi: 10.1039/c7np00064b. Epub 2018 Jul 13.

本文引用的文献

Annotating the biomedical literature for the human variome.注释人类变异组的生物医学文献。

Database (Oxford). 2013 Apr 12;2013:bat019. doi: 10.1093/database/bat019. Print 2013.

tmVar: a text mining approach for extracting sequence variants in biomedical literature.tmVar：一种从生物医学文献中提取序列变异的文本挖掘方法。

Bioinformatics. 2013 Jun 1;29(11):1433-9. doi: 10.1093/bioinformatics/btt156. Epub 2013 Apr 5.

Detection of protein catalytic sites in the biomedical literature.生物医学文献中蛋白质催化位点的检测。

Pac Symp Biocomput. 2013:433-44.

Literature mining of protein-residue associations with graph rules learned through distant supervision.通过远程监督学习的图形规则对蛋白质-残基关联进行文献挖掘。

J Biomed Semantics. 2012 Oct 5;3 Suppl 3(Suppl 3):S2. doi: 10.1186/2041-1480-3-S3-S2.

Mining the pharmacogenomics literature--a survey of the state of the art.挖掘药物基因组学文献——技术现状调查。

Brief Bioinform. 2012 Jul;13(4):460-94. doi: 10.1093/bib/bbs018.

Automated extraction and semantic analysis of mutation impacts from the biomedical literature.从生物医学文献中自动提取和语义分析突变影响。

BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S10. doi: 10.1186/1471-2164-13-S4-S10.

A mutation-centric approach to identifying pharmacogenomic relations in text.基于突变的方法识别文本中的药物基因组学关系。

J Biomed Inform. 2012 Oct;45(5):835-41. doi: 10.1016/j.jbi.2012.05.003. Epub 2012 Jun 7.

A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions.从 PubMed 中提取 SNP 以关联遗传变异与药物、疾病和不良反应。

J Biomed Inform. 2012 Oct;45(5):842-50. doi: 10.1016/j.jbi.2012.04.006. Epub 2012 Apr 30.

Text mining for the biocuration workflow.文本挖掘在生物注释工作流中的应用。

Database (Oxford). 2012 Apr 18;2012:bas020. doi: 10.1093/database/bas020. Print 2012.

The value of data.数据的价值。

Nat Genet. 2011 Mar 29;43(4):281-3. doi: 10.1038/ng0411-281.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于整理的基因变异文献挖掘：量化补充材料的重要性。

Literature mining of genetic variants for curation: quantifying the importance of supplementary material.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献