Suppr超能文献

tmVar 3.0:一种改进的变异概念识别和标准化工具。

tmVar 3.0: an improved variant concept recognition and normalization tool.

机构信息

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.

出版信息

Bioinformatics. 2022 Sep 15;38(18):4449-4451. doi: 10.1093/bioinformatics/btac537.

Abstract

MOTIVATION

Previous studies have shown that automated text-mining tools are becoming increasingly important for successfully unlocking variant information in scientific literature at large scale. Despite multiple attempts in the past, existing tools are still of limited recognition scope and precision.

RESULT

We propose tmVar 3.0: an improved variant recognition and normalization system. Compared to its predecessors, tmVar 3.0 recognizes a wider spectrum of variant-related entities (e.g. allele and copy number variants), and groups together different variant mentions belonging to the same genomic sequence position in an article for improved accuracy. Moreover, tmVar 3.0 provides advanced variant normalization options such as allele-specific identifiers from the ClinGen Allele Registry. tmVar 3.0 exhibits state-of-the-art performance with over 90% in F-measure for variant recognition and normalization, when evaluated on three independent benchmarking datasets. tmVar 3.0 as well as annotations for the entire PubMed and PMC datasets are freely available for download.

AVAILABILITY AND IMPLEMENTATION

https://github.com/ncbi/tmVar3.

摘要

动机

先前的研究表明,自动化文本挖掘工具对于在大规模的科学文献中成功解锁变体信息变得越来越重要。尽管过去有多次尝试,但现有工具的识别范围和精度仍然有限。

结果

我们提出了 tmVar 3.0:一种改进的变体识别和标准化系统。与之前的版本相比,tmVar 3.0 可以识别更广泛的变体相关实体(例如等位基因和拷贝数变体),并将同一文章中属于同一基因组位置的不同变体提及分组在一起,以提高准确性。此外,tmVar 3.0 提供了高级的变体标准化选项,例如来自 ClinGen 等位基因注册中心的等位基因特异性标识符。当在三个独立的基准测试数据集上进行评估时,tmVar 3.0 在变体识别和标准化方面的 F 度量超过 90%,表现出最先进的性能。tmVar 3.0 以及整个 PubMed 和 PMC 数据集的注释均可免费下载。

可用性和实现

https://github.com/ncbi/tmVar3.

相似文献

7
nala: text mining natural language mutation mentions.纳拉:文本挖掘自然语言中的突变提及。
Bioinformatics. 2017 Jun 15;33(12):1852-1858. doi: 10.1093/bioinformatics/btx083.

引用本文的文献

10
BELB: a biomedical entity linking benchmark.BELB:一个生物医学实体链接基准。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad698.

本文引用的文献

6
nala: text mining natural language mutation mentions.纳拉:文本挖掘自然语言中的突变提及。
Bioinformatics. 2017 Jun 15;33(12):1852-1858. doi: 10.1093/bioinformatics/btx083.
7
SETH detects and normalizes genetic variants in text.SETH可检测并规范文本中的基因变异。
Bioinformatics. 2016 Sep 15;32(18):2883-5. doi: 10.1093/bioinformatics/btw234. Epub 2016 Jun 2.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验