原生生物核糖体参考数据库(PR2):一个经过精心分类的单细胞真核生物小亚基 rRNA 序列目录。
The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy.
机构信息
CNRS, UMR 7144, Adaptation et Diversité en Milieu Marin, 29682 Roscoff, France.
出版信息
Nucleic Acids Res. 2013 Jan;41(Database issue):D597-604. doi: 10.1093/nar/gks1160. Epub 2012 Nov 27.
The interrogation of genetic markers in environmental meta-barcoding studies is currently seriously hindered by the lack of taxonomically curated reference data sets for the targeted genes. The Protist Ribosomal Reference database (PR(2), http://ssu-rrna.org/) provides a unique access to eukaryotic small sub-unit (SSU) ribosomal RNA and DNA sequences, with curated taxonomy. The database mainly consists of nuclear-encoded protistan sequences. However, metazoans, land plants, macrosporic fungi and eukaryotic organelles (mitochondrion, plastid and others) are also included because they are useful for the analysis of high-troughput sequencing data sets. Introns and putative chimeric sequences have been also carefully checked. Taxonomic assignation of sequences consists of eight unique taxonomic fields. In total, 136 866 sequences are nuclear encoded, 45 708 (36 501 mitochondrial and 9657 chloroplastic) are from organelles, the remaining being putative chimeric sequences. The website allows the users to download sequences from the entire and partial databases (including representative sequences after clustering at a given level of similarity). Different web tools also allow searches by sequence similarity. The presence of both rRNA and rDNA sequences, taking into account introns (crucial for eukaryotic sequences), a normalized eight terms ranked-taxonomy and updates of new GenBank releases were made possible by a long-term collaboration between experts in taxonomy and computer scientists.
目前,环境宏条形码研究中对遗传标记的分析受到缺乏目标基因分类学整理参考数据集的严重阻碍。原生生物核糖体参考数据库(PR(2),http://ssu-rrna.org/)为真核生物小亚基(SSU)核糖体 RNA 和 DNA 序列提供了一个独特的分类学访问途径,这些序列经过了精心整理。该数据库主要由核编码原生生物序列组成。然而,后生动物、陆地植物、大型孢子真菌和真核细胞器(线粒体、质体等)也包括在内,因为它们对高通量测序数据集的分析很有用。内含子和假定的嵌合序列也经过了仔细检查。序列的分类分配由八个独特的分类字段组成。总共有 136866 条核编码序列,45708 条(36501 条线粒体和 9657 条质体)来自细胞器,其余的是假定的嵌合序列。该网站允许用户从整个和部分数据库下载序列(包括在给定相似性水平聚类后的代表性序列)。不同的网络工具还允许通过序列相似性进行搜索。这种方法之所以成为可能,是因为分类学专家和计算机科学家之间进行了长期合作,考虑到内含子(对真核序列至关重要),同时提供 rRNA 和 rDNA 序列,以及标准化的 8 项排名分类学和新 GenBank 版本的更新。
相似文献
Database (Oxford). 2024-6-12
BMC Bioinformatics. 2021-8-12
Mol Ecol Resour. 2018-10-24
Nucleic Acids Res. 2005-1-1
Nucleic Acids Res. 2012-11-28
引用本文的文献
Environ Microbiome. 2025-8-21
Proc Natl Acad Sci U S A. 2025-7-29
本文引用的文献
J Eukaryot Microbiol. 2012-9
Proc Natl Acad Sci U S A. 2011-7-25
PLoS One. 2011-4-4
Bioinformatics. 2010-4-28
Bioinformatics. 2010-3-1
Environ Microbiol. 2008-12