Suppr超能文献

链球菌短编码序列的全基因组调查。

A genome-wide survey of short coding sequences in streptococci.

作者信息

Ibrahim Mariam, Nicolas Pierre, Bessières Philippe, Bolotin Alexander, Monnet Véronique, Gardan Rozenn

机构信息

Unité de Biochimie Bactérienne, UR477, INRA, 78350 Jouy-en-Josas, France.

Unité Mathématique Informatique et Génome, UR1077, INRA, 78350 Jouy-en-Josas, France.

出版信息

Microbiology (Reading). 2007 Nov;153(Pt 11):3631-3644. doi: 10.1099/mic.0.2007/006205-0.

Abstract

Identification of short genes that encode peptides of fewer than 60 aa is challenging, both experimentally and in silico. As a consequence, the universe of these short coding sequences (CDSs) remains largely unknown, although some are acknowledged to play important roles in cell-cell communication, particularly in Gram-positive bacteria. This paper reports a thorough search for short CDSs across streptococcal genomes. Our bioinformatic approach relied on a combination of advanced intrinsic and extrinsic methods. In the first step, intrinsic sequence information (nucleotide composition and presence of RBSs) served to identify new short putative CDSs (spCDSs) and to eliminate the differences between annotation policies. In the second step, pseudogene fragments and false predictions were filtered out. The last step consisted of screening the remaining spCDSs for lines of extrinsic evidence involving sequence and gene-context comparisons. A total of 789 spCDSs across 20 complete genomes (19 Streptococcus and one Enterococcus) received the support of at least one line of extrinsic evidence, which corresponds to an average of 20 short CDSs per million base pairs. Most of these had no known function, and a significant fraction (31%) are not even annotated as hypothetical genes in GenBank records. As an illustration of the value of this list, we describe a new family of CDSs, encoding very short hydrophobic peptides (20-23 aa) situated just upstream of some of the positive transcriptional regulators of the Rgg family. The expression of seven other short CDSs from Streptococcus thermophilus CNRZ1066 that encode peptides ranging in length from 41 to 56 aa was confirmed by real-time quantitative RT-PCR and revealed a variety of expression patterns. Finally, one peptide from this list, encoded by a gene that is not annotated in GenBank, was identified in a cell-envelope-enriched fraction of S. thermophilus CNRZ1066.

摘要

鉴定编码少于60个氨基酸的肽的短基因在实验和计算机分析方面都具有挑战性。因此,这些短编码序列(CDS)的整体情况仍 largely未知,尽管一些被认为在细胞间通讯中发挥重要作用,特别是在革兰氏阳性细菌中。本文报告了对链球菌基因组中短CDS的全面搜索。我们的生物信息学方法依赖于先进的内在和外在方法的结合。第一步,内在序列信息(核苷酸组成和核糖体结合位点的存在)用于识别新的短推定CDS(spCDS)并消除注释策略之间的差异。第二步,过滤掉假基因片段和错误预测。最后一步包括筛选剩余的spCDS,寻找涉及序列和基因上下文比较的外在证据线索。在20个完整基因组(19个链球菌和1个肠球菌)中总共789个spCDS获得了至少一条外在证据线索的支持,这相当于每百万碱基对平均有20个短CDS。其中大多数没有已知功能,并且很大一部分(31%)在GenBank记录中甚至没有被注释为假设基因。作为该列表价值的一个例证,我们描述了一个新的CDS家族,其编码位于Rgg家族一些正转录调节因子上游的非常短的疏水肽(20 - 23个氨基酸)。通过实时定量RT-PCR证实了来自嗜热链球菌CNRZ1066的其他七个短CDS的表达,这些CDS编码长度为41至56个氨基酸的肽,并揭示了多种表达模式。最后,在嗜热链球菌CNRZ1066的富含细胞包膜的部分中鉴定出了该列表中的一种由未在GenBank中注释的基因编码的肽。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验