• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

细菌 IV 型分泌系统的命名实体识别。

Named entity recognition for bacterial Type IV secretion systems.

机构信息

School of Computer Science, University of Manchester, Manchester, United Kingdom.

出版信息

PLoS One. 2011 Mar 29;6(3):e14780. doi: 10.1371/journal.pone.0014780.

DOI:10.1371/journal.pone.0014780
PMID:21468321
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3066171/
Abstract

Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents.

摘要

专门的生物系统研究常常受到术语不一致的阻碍,尤其是在跨物种的情况下。在细菌 IV 型分泌系统中,同一组同源基因可能有十多个不同的名称。基于生物过程、细胞成分、分子功能和微生物物种对研究出版物进行分类,应该可以提高文献搜索的准确性和召回率,使研究人员能够通过 Pathosystems Resource Integration Center(PATRIC,patricbrc.org)等资源跟上文献的指数级增长。我们开发了与 IV 型分泌系统相关的四个实体的命名实体识别(NER)工具:1)细菌名称,2)生物过程,3)分子功能,4)细胞成分。这些实体对于发病机制和毒力研究很重要,但受到的关注不如其他实体(如基因和蛋白质)多。基于带注释的语料库、大型领域术语资源和机器学习技术,我们为这些实体开发了识别器。细菌、生物过程和分子功能的准确率都达到了 80%以上。对比实验突出了替代识别策略的有效性;在对比文档集上进行的术语提取结果证明了这些类对于识别与 T4SS 相关的文档的有用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a1c/3066171/5e26ad02e902/pone.0014780.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a1c/3066171/076bb4f4ea8f/pone.0014780.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a1c/3066171/5e26ad02e902/pone.0014780.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a1c/3066171/076bb4f4ea8f/pone.0014780.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a1c/3066171/5e26ad02e902/pone.0014780.g002.jpg

相似文献

1
Named entity recognition for bacterial Type IV secretion systems.细菌 IV 型分泌系统的命名实体识别。
PLoS One. 2011 Mar 29;6(3):e14780. doi: 10.1371/journal.pone.0014780.
2
Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
3
Terminologies augmented recurrent neural network model for clinical named entity recognition.基于扩充术语的循环神经网络模型在临床命名实体识别中的应用。
J Biomed Inform. 2020 Feb;102:103356. doi: 10.1016/j.jbi.2019.103356. Epub 2019 Dec 16.
4
Entity recognition in the biomedical domain using a hybrid approach.使用混合方法进行生物医学领域的实体识别。
J Biomed Semantics. 2017 Nov 9;8(1):51. doi: 10.1186/s13326-017-0157-6.
5
POSBIOTM-NER: a trainable biomedical named-entity recognition system.POSBIOTM-NER:一个可训练的生物医学命名实体识别系统。
Bioinformatics. 2005 Jun 1;21(11):2794-6. doi: 10.1093/bioinformatics/bti414. Epub 2005 Apr 6.
6
Accelerating the annotation of sparse named entities by dynamic sentence selection.通过动态句子选择加速稀疏命名实体的标注
BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-9-S11-S8.
7
Prompt Framework for Extracting Scale-Related Knowledge Entities from Chinese Medical Literature: Development and Evaluation Study.从中医文献中提取量表相关知识实体的提示框架:开发与评估研究
J Med Internet Res. 2025 Mar 18;27:e67033. doi: 10.2196/67033.
8
Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition.词汇很重要:用于酶命名实体识别的标注流水线和四个深度学习算法。
J Proteome Res. 2024 Jun 7;23(6):1915-1925. doi: 10.1021/acs.jproteome.3c00367. Epub 2024 May 11.
9
A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.基于机器学习的方法从出院小结中提取临床实体及其断言的研究。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.
10
Challenges in clinical natural language processing for automated disorder normalization.临床自然语言处理中自动疾病标准化的挑战。
J Biomed Inform. 2015 Oct;57:28-37. doi: 10.1016/j.jbi.2015.07.010. Epub 2015 Jul 14.

引用本文的文献

1
Informatics-Driven Infectious Disease Research.信息学驱动的传染病研究
Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2013;273:3-11. doi: 10.1007/978-3-642-29752-6_1.
2
Conjugative type IV secretion systems enable bacterial antagonism that operates independently of plasmid transfer.接合性IV型分泌系统可实现独立于质粒转移的细菌拮抗作用。
Commun Biol. 2024 Apr 25;7(1):499. doi: 10.1038/s42003-024-06192-8.
3
A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。

本文引用的文献

1
A multilingual ontology for infectious disease surveillance: rationale, design and challenges.用于传染病监测的多语言本体:基本原理、设计与挑战。
Lang Resour Eval. 2006;40(3):405. doi: 10.1007/s10579-007-9019-7. Epub 2007 Jun 26.
2
Legionella pneumophila strain 130b possesses a unique combination of type IV secretion systems and novel Dot/Icm secretion system effector proteins.嗜肺军团菌 130b 株拥有独特的 IV 型分泌系统组合和新型的 Dot/Icm 分泌系统效应蛋白。
J Bacteriol. 2010 Nov;192(22):6001-16. doi: 10.1128/JB.00778-10. Epub 2010 Sep 10.
3
Phylogenomics reveals a diverse Rickettsiales type IV secretion system.
NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.
4
Text mining tools for extracting information about microbial biodiversity in food.用于从食品中提取微生物生物多样性信息的文本挖掘工具。
Food Microbiol. 2019 Aug;81:63-75. doi: 10.1016/j.fm.2018.04.011. Epub 2018 Apr 21.
5
Secretome of obligate intracellular Rickettsia.专性胞内立克次氏体的分泌蛋白组
FEMS Microbiol Rev. 2015 Jan;39(1):47-80. doi: 10.1111/1574-6976.12084. Epub 2014 Dec 4.
6
The expanding bacterial type IV secretion lexicon.不断扩展的细菌 IV 型分泌系统词汇表。
Res Microbiol. 2013 Jul-Aug;164(6):620-39. doi: 10.1016/j.resmic.2013.03.012. Epub 2013 Mar 28.
7
Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.利用语义信息为比较毒理学基因组数据库对 PubMed 文章进行优先级排序。
Database (Oxford). 2012 Nov 17;2012:bas042. doi: 10.1093/database/bas042. Print 2012.
8
Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011.生物自然语言处理共享任务 2011 的 ID、EPI 和 REL 任务概述。
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-13-S11-S2.
9
Combined SVM-CRFs for biological named entity recognition with maximal bidirectional squeezing.基于最大双向挤压的联合 SVM-CRFs 生物命名实体识别。
PLoS One. 2012;7(6):e39230. doi: 10.1371/journal.pone.0039230. Epub 2012 Jun 26.
10
PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species.PATRIC:专注于人类病原物种的全面细菌生物信息学资源。
Infect Immun. 2011 Nov;79(11):4286-98. doi: 10.1128/IAI.00207-11. Epub 2011 Sep 6.
系统发生基因组学揭示了多样的立克次体 IV 型分泌系统。
Infect Immun. 2010 May;78(5):1809-23. doi: 10.1128/IAI.01384-09. Epub 2010 Feb 22.
4
Biological diversity of prokaryotic type IV secretion systems.原核生物 IV 型分泌系统的生物多样性。
Microbiol Mol Biol Rev. 2009 Dec;73(4):775-808. doi: 10.1128/MMBR.00023-09.
5
The structural biology of type IV secretion systems.IV型分泌系统的结构生物学
Nat Rev Microbiol. 2009 Oct;7(10):703-14. doi: 10.1038/nrmicro2218.
6
Two-phase biomedical named entity recognition using CRFs.使用条件随机场的两阶段生物医学命名实体识别
Comput Biol Chem. 2009 Aug;33(4):334-8. doi: 10.1016/j.compbiolchem.2009.07.004. Epub 2009 Aug 4.
7
Controlled vocabularies for microbial virulence factors.微生物毒力因子的受控词汇表。
Trends Microbiol. 2009 Jul;17(7):279-85. doi: 10.1016/j.tim.2009.04.002. Epub 2009 Jul 2.
8
Classifying disease outbreak reports using n-grams and semantic features.利用 n 元组和语义特征对疾病爆发报告进行分类。
Int J Med Inform. 2009 Dec;78(12):e47-58. doi: 10.1016/j.ijmedinf.2009.03.010. Epub 2009 May 15.
9
An anomalous type IV secretion system in Rickettsia is evolutionarily conserved.立克次氏体中一种异常的IV型分泌系统在进化上是保守的。
PLoS One. 2009;4(3):e4833. doi: 10.1371/journal.pone.0004833. Epub 2009 Mar 12.
10
Legionella pneumophila Dot/Icm translocated substrates: a sum of parts.嗜肺军团菌Dot/Icm转运底物:各部分之和
Curr Opin Microbiol. 2009 Feb;12(1):67-73. doi: 10.1016/j.mib.2008.12.004. Epub 2009 Jan 20.