SR4GN：一种用于基因标准化的物种识别软件工具。

SR4GN: a species recognition software tool for gene normalization.

作者信息

Wei Chih-Hsuan, Kao Hung-Yu, Lu Zhiyong

机构信息

National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, United States of America.

出版信息

PLoS One. 2012;7(6):e38460. doi: 10.1371/journal.pone.0038460. Epub 2012 Jun 5.

DOI:10.1371/journal.pone.0038460

PMID:22679507

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3367953/

Abstract

As suggested in recent studies, species recognition and disambiguation is one of the most critical and challenging steps in many downstream text-mining applications such as the gene normalization task and protein-protein interaction extraction. We report SR4GN: an open source tool for species recognition and disambiguation in biomedical text. In addition to the species detection function in existing tools, SR4GN is optimized for the Gene Normalization task. As such it is developed to link detected species with corresponding gene mentions in a document. SR4GN achieves 85.42% in accuracy and compares favorably to the other state-of-the-art techniques in benchmark experiments. Finally, SR4GN is implemented as a standalone software tool, thus making it convenient and robust for use in many text-mining applications. SR4GN can be downloaded at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/downloads/SR4GN.

摘要

正如最近的研究所表明的，物种识别与消歧是许多下游文本挖掘应用（如基因标准化任务和蛋白质-蛋白质相互作用提取）中最关键且具有挑战性的步骤之一。我们报告了SR4GN：一种用于生物医学文本中物种识别与消歧的开源工具。除了现有工具中的物种检测功能外，SR4GN针对基因标准化任务进行了优化。因此，它被开发用于将文档中检测到的物种与相应的基因提及进行关联。SR4GN在基准实验中准确率达到85.42%，与其他最先进的技术相比具有优势。最后，SR4GN被实现为一个独立的软件工具，从而使其在许多文本挖掘应用中使用起来既方便又稳健。SR4GN可从以下网址下载：http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/downloads/SR4GN 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81ca/3367953/0cb3a6549a85/pone.0038460.g001.jpg

相似文献

SR4GN: a species recognition software tool for gene normalization.SR4GN：一种用于基因标准化的物种识别软件工具。

PLoS One. 2012;7(6):e38460. doi: 10.1371/journal.pone.0038460. Epub 2012 Jun 5.

tmBioC: improving interoperability of text-mining tools with BioC.tmBioC：提高文本挖掘工具与BioC的互操作性。

Database (Oxford). 2014 Jul 25;2014. doi: 10.1093/database/bau073. Print 2014.

Beyond accuracy: creating interoperable and scalable text-mining web services.超越准确性：创建可互操作且可扩展的文本挖掘网络服务。

Bioinformatics. 2016 Jun 15;32(12):1907-10. doi: 10.1093/bioinformatics/btv760. Epub 2016 Feb 16.

ProNormz--an integrated approach for human proteins and protein kinases normalization.ProNormz——一种用于人类蛋白质和蛋白激酶标准化的综合方法。

J Biomed Inform. 2014 Feb;47:131-8. doi: 10.1016/j.jbi.2013.10.003. Epub 2013 Oct 19.

SimConcept: a hybrid approach for simplifying composite named entities in biomedical text.SimConcept：一种简化生物医学文本中复合命名实体的混合方法。

IEEE J Biomed Health Inform. 2015 Jul;19(4):1385-91. doi: 10.1109/JBHI.2015.2422651. Epub 2015 Apr 13.

LINNAEUS: a species name identification system for biomedical literature.林奈氏：生物医学文献的物种名称识别系统。

BMC Bioinformatics. 2010 Feb 11;11:85. doi: 10.1186/1471-2105-11-85.

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.TaggerOne：使用半马尔可夫模型进行联合命名实体识别与归一化

Bioinformatics. 2016 Sep 15;32(18):2839-46. doi: 10.1093/bioinformatics/btw343. Epub 2016 Jun 9.

NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition.NLM-Gene，一个丰富注释的基因实体黄金标准数据集，解决了模糊性和多物种基因识别问题。

J Biomed Inform. 2021 Jun;118:103779. doi: 10.1016/j.jbi.2021.103779. Epub 2021 Apr 9.

NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库：一种用于疾病名称识别和概念规范化的资源。

J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.

DNorm: disease name normalization with pairwise learning to rank.DNorm：基于对分学习排序的疾病名称标准化。

Bioinformatics. 2013 Nov 15;29(22):2909-17. doi: 10.1093/bioinformatics/btt474. Epub 2013 Aug 21.

引用本文的文献

Darling (v2.0): Mining disease-related databases for the detection of biomedical entity associations.达林（v2.0）：挖掘疾病相关数据库以检测生物医学实体关联。

Comput Struct Biotechnol J. 2025 Jun 14;27:2626-2637. doi: 10.1016/j.csbj.2025.06.025. eCollection 2025.

SciLinker: a large-scale text mining framework for mapping associations among biological entities.SciLinker：一个用于映射生物实体之间关联的大规模文本挖掘框架。

Front Artif Intell. 2025 Mar 19;8:1528562. doi: 10.3389/frai.2025.1528562. eCollection 2025.

Improving dictionary-based named entity recognition with deep learning.利用深度学习改进基于字典的命名实体识别。

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii45-ii52. doi: 10.1093/bioinformatics/btae402.

Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction.生物医学文献挖掘：基于图核的基因-基因相互作用提取的学习方法。

Eur J Med Res. 2024 Aug 2;29(1):404. doi: 10.1186/s40001-024-01983-5.

BELHD: improving biomedical entity linking with homonym disambiguation.BELHD：利用同形词消歧改进生物医学实体链接。

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae474.

Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria.阿尔茨海默病纳入标准的命名实体识别与规范化

Proc (IEEE Int Conf Healthc Inform). 2023 Jun;2023:558-564. doi: 10.1109/ichi57859.2023.00100. Epub 2023 Dec 11.

BELB: a biomedical entity linking benchmark.BELB：一个生物医学实体链接基准。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad698.

Changing word meanings in biomedical literature reveal pandemics and new technologies.生物医学文献中词汇意义的变化揭示了大流行病和新技术。

BioData Min. 2023 May 5;16(1):16. doi: 10.1186/s13040-023-00332-2.

PlagueKD: a knowledge graph-based plague knowledge database.瘟疫知识库：基于知识图谱的瘟疫知识库。

Database (Oxford). 2022 Nov 21;2022. doi: 10.1093/database/baac100.

Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts.通过从生物医学摘要中进行多关系提取来扩展基于数据库的生物医学知识图谱。

BioData Min. 2022 Oct 18;15(1):26. doi: 10.1186/s13040-022-00311-z.

本文引用的文献

Unsupervised corpus distillation for represented indicator measurement on focus species detection.用于聚焦物种检测中代表性指标测量的无监督语料库提炼

Int J Data Min Bioinform. 2013;8(4):413-26. doi: 10.1504/ijdmb.2013.056615.

Cross-species gene normalization by species inference.物种推断的跨物种基因标准化。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S5. doi: 10.1186/1471-2105-12-S8-S5.

The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.BioCreative III 的蛋白质-蛋白质相互作用任务：文章的分类/排序和将生物本体论概念链接到全文。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-12-S8-S3.

The gene normalization task in BioCreative III.BioCreative III 中的基因标准化任务。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-12-S8-S2.

OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.生物标记器：在生物医学文献中检测、规范和定位生物实体。

Bioinformatics. 2011 Oct 1;27(19):2721-9. doi: 10.1093/bioinformatics/btr452. Epub 2011 Aug 9.

The GNAT library for local and remote gene mention normalization.GNAT 库，用于本地和远程基因提及标准化。

Bioinformatics. 2011 Oct 1;27(19):2769-71. doi: 10.1093/bioinformatics/btr455. Epub 2011 Aug 3.

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics.阈平均值精度（TAP-k）：一种专为生物信息学设计的检索度量标准。

Bioinformatics. 2010 Jul 15;26(14):1708-13. doi: 10.1093/bioinformatics/btq270. Epub 2010 May 26.

LINNAEUS: a species name identification system for biomedical literature.林奈氏：生物医学文献的物种名称识别系统。

BMC Bioinformatics. 2010 Feb 11;11:85. doi: 10.1186/1471-2105-11-85.

Disambiguating the species of biomedical named entities using natural language parsers.利用自然语言解析器对生物医学命名实体进行消歧。

Bioinformatics. 2010 Mar 1;26(5):661-7. doi: 10.1093/bioinformatics/btq002. Epub 2010 Jan 6.

U-Compare: share and compare text mining tools with UIMA.U-Compare：与 UIMA 共享和比较文本挖掘工具。

Bioinformatics. 2009 Aug 1;25(15):1997-8. doi: 10.1093/bioinformatics/btp289. Epub 2009 May 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SR4GN：一种用于基因标准化的物种识别软件工具。

SR4GN: a species recognition software tool for gene normalization.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献