• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RefSeq:通过蛋白质家族模型编纂扩展原核生物基因组注释管道的覆盖范围。

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA.

出版信息

Nucleic Acids Res. 2021 Jan 8;49(D1):D1020-D1028. doi: 10.1093/nar/gkaa1105.

DOI:10.1093/nar/gkaa1105
PMID:33270901
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7779008/
Abstract

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.

摘要

国家生物技术信息中心 (NCBI) 的参考序列 (RefSeq) 项目包含近 20 万种细菌和古菌基因组以及 1.5 亿种具有最新注释的蛋白质。自 2018 年以来,原核生物基因组注释流水线 (PGAP) 的变化导致虚假注释大量减少。PGAP 用作结构和功能注释证据的蛋白质家族模型 (PFM) 的分层集合已扩展到超过 35000 个蛋白质轮廓隐马尔可夫模型 (HMM)、12300 个 BlastRules 和 36000 个经过策管的 CDD 架构。因此,现在超过 1.22 亿或 79%的 RefSeq 蛋白质是根据与策管 PFM 的匹配来命名的。超过 40%的 PFM 具有基因符号、酶委员会编号或支持出版物属性,并通过它们命名的蛋白质和特征继承,从而促进多基因组分析和与文献的联系。为了遵守 FAIR(可发现、可访问、可互操作、可重用)原则,任何用户都可以在蛋白质家族模型 Entrez 数据库中访问 PFM。最后,参考和代表性基因组集是 RefSeq 原核生物基因组的一个具有分类多样性的子集,现在定期重新计算,并可用于下载和与 BLAST 进行同源搜索。RefSeq 可在 https://www.ncbi.nlm.nih.gov/refseq/ 找到。

相似文献

1
RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.RefSeq:通过蛋白质家族模型编纂扩展原核生物基因组注释管道的覆盖范围。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1020-D1028. doi: 10.1093/nar/gkaa1105.
2
RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.RefSeq 与宏基因组时代的原核生物基因组注释流程。
Nucleic Acids Res. 2024 Jan 5;52(D1):D762-D769. doi: 10.1093/nar/gkad988.
3
RefSeq: an update on prokaryotic genome annotation and curation.RefSeq:原核生物基因组注释和管理的最新进展。
Nucleic Acids Res. 2018 Jan 4;46(D1):D851-D860. doi: 10.1093/nar/gkx1068.
4
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.美国国立生物技术信息中心的参考序列(RefSeq)数据库:当前状态、分类扩展及功能注释。
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45. doi: 10.1093/nar/gkv1189. Epub 2015 Nov 8.
5
RefSeq: an update on mammalian reference sequences.RefSeq:哺乳动物参考序列的更新。
Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. doi: 10.1093/nar/gkt1114. Epub 2013 Nov 19.
6
NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.NCBI 参考序列(RefSeq):现状、新特性和基因组注释政策。
Nucleic Acids Res. 2012 Jan;40(Database issue):D130-5. doi: 10.1093/nar/gkr1079. Epub 2011 Nov 24.
7
Update on RefSeq microbial genomes resources.RefSeq微生物基因组资源更新
Nucleic Acids Res. 2015 Jan;43(Database issue):D599-605. doi: 10.1093/nar/gku1062. Epub 2014 Dec 15.
8
Prot2HG: a database of protein domains mapped to the human genome.Prot2HG:一个映射到人类基因组的蛋白质结构域数据库。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baz161.
9
Comparison of RefSeq protein-coding regions in human and vertebrate genomes.比较人类和脊椎动物基因组中的 RefSeq 编码蛋白区域。
BMC Genomics. 2013 Sep 25;14:654. doi: 10.1186/1471-2164-14-654.
10
The UCSC Genome Browser database: 2021 update.UCSC 基因组浏览器数据库:2021 年更新。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1046-D1057. doi: 10.1093/nar/gkaa1070.

引用本文的文献

1
Conservation of sporulation genes and a transmembrane-containing Spo0B variant in .芽孢形成基因及含跨膜结构域的Spo0B变体在……中的保守性
bioRxiv. 2025 Aug 24:2025.08.24.672004. doi: 10.1101/2025.08.24.672004.
2
Genome mining of Streptomyces bambergiensis AC-800 unravels the biosynthetic gene cluster for inhibitors of prolyl hydroxylase fibrostatins.对巴姆贝格链霉菌AC-800进行基因组挖掘,揭示了脯氨酰羟化酶抑制剂纤维他汀的生物合成基因簇。
Sci Rep. 2025 Sep 1;15(1):32142. doi: 10.1038/s41598-025-17585-y.
3
Effect of dietary zinc supplementation on the gastrointestinal microbiome and host gene expression in the mouse model of autism spectrum disorder.膳食补充锌对自闭症谱系障碍小鼠模型胃肠道微生物群和宿主基因表达的影响。
Front Microbiol. 2025 Aug 12;16:1607045. doi: 10.3389/fmicb.2025.1607045. eCollection 2025.
4
Discovery of a Novel Antimicrobial Peptide from sp. Na14 with Potent Activity Against Gram-Negative Bacteria and Genomic Insights into Its Biosynthetic Pathway.从sp. Na14中发现一种对革兰氏阴性菌具有强效活性的新型抗菌肽及其生物合成途径的基因组学见解。
Antibiotics (Basel). 2025 Aug 6;14(8):805. doi: 10.3390/antibiotics14080805.
5
Bringing the uncultivated microbial majority of freshwater ecosystems into culture.将淡水生态系统中未培养的大多数微生物培养出来。
Nat Commun. 2025 Aug 26;16(1):7971. doi: 10.1038/s41467-025-63266-9.
6
A telomere-to-telomere genome of wild soybean with resistance to soybean cyst nematode X12.对大豆胞囊线虫X12具有抗性的野生大豆的端粒到端粒基因组。
Sci Data. 2025 Aug 13;12(1):1412. doi: 10.1038/s41597-025-05741-y.
7
Selection Maintains Photosynthesis in a Symbiotic Cyanobacterium Despite Redundancy With its Fern Host.尽管共生蓝细菌与其蕨类宿主存在冗余,但选择仍维持了其光合作用。
Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf181.
8
Fold first, ask later: structure-informed function annotation of phage proteins.先折叠,后询问:噬菌体蛋白质的结构导向功能注释
bioRxiv. 2025 Jul 20:2025.07.17.665397. doi: 10.1101/2025.07.17.665397.
9
Pathway polygenic risk scores (pPRS) for the analysis of gene-environment interaction.用于基因-环境相互作用分析的通路多基因风险评分(pPRS)。
PLoS Genet. 2025 Aug 5;21(8):e1011543. doi: 10.1371/journal.pgen.1011543. eCollection 2025 Aug.
10
Complete mitochondrial genome assembly and comparative analysis of Fagopyrum dibotrys (Golden Buckwheat).金荞麦的线粒体全基因组组装及比较分析
BMC Plant Biol. 2025 Jul 30;25(1):985. doi: 10.1186/s12870-025-06990-0.

本文引用的文献

1
ganon: precise metagenomics classification against large and up-to-date sets of reference sequences.ganon:针对大型且最新的参考序列集进行精确的宏基因组分类。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i12-i20. doi: 10.1093/bioinformatics/btaa458.
2
UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.UniRule:UniProt 知识库中自动注释的统一规则资源。
Bioinformatics. 2020 Nov 1;36(17):4643-4648. doi: 10.1093/bioinformatics/btaa485.
3
NCBI's Conserved Domain Database and Tools for Protein Domain Analysis.NCBI 的保守结构域数据库和蛋白质结构域分析工具。
Curr Protoc Bioinformatics. 2020 Mar;69(1):e90. doi: 10.1002/cpbi.90.
4
CDD/SPARCLE: the conserved domain database in 2020.CDD/SPARCLE:2020 年的保守结构域数据库。
Nucleic Acids Res. 2020 Jan 8;48(D1):D265-D268. doi: 10.1093/nar/gkz991.
5
tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences.tRNAscan-SE:在基因组序列中搜索tRNA基因。
Methods Mol Biol. 2019;1962:1-14. doi: 10.1007/978-1-4939-9173-0_1.
6
The EcoCyc Database.EcoCyc数据库。
EcoSal Plus. 2018 Nov;8(1). doi: 10.1128/ecosalplus.ESP-0006-2018.
7
VFDB 2019: a comparative pathogenomic platform with an interactive web interface.VFDB 2019:一个具有交互式网络界面的比较病原体基因组学平台。
Nucleic Acids Res. 2019 Jan 8;47(D1):D687-D692. doi: 10.1093/nar/gky1080.
8
RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification.RefSeq 数据库的增长影响了基于 k-mer 的最低共同祖先物种鉴定的准确性。
Genome Biol. 2018 Oct 30;19(1):165. doi: 10.1186/s13059-018-1554-6.
9
Genome properties in 2019: a new companion database to InterPro for the inference of complete functional attributes.2019 年的基因组特性:InterPro 的新配套数据库,用于推断完整的功能属性。
Nucleic Acids Res. 2019 Jan 8;47(D1):D564-D572. doi: 10.1093/nar/gky1013.
10
The Pfam protein families database in 2019.2019 年 Pfam 蛋白质家族数据库。
Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432. doi: 10.1093/nar/gky995.