Suppr超能文献

大肠杆菌、结核分枝杆菌和酿酒酵母中核糖体蛋白编码基因的保守密码子组成:功能基因组学中监督机器学习的经验教训

Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics.

作者信息

Lin Kui, Kuang Yuyu, Joseph Jeremiah S, Kolatkar Prasanna R

机构信息

IMCB-BIC, Institute of Molecular and Cell Biology, 30 Medical Drive, 117609 Singapore.

出版信息

Nucleic Acids Res. 2002 Jun 1;30(11):2599-607. doi: 10.1093/nar/30.11.2599.

Abstract

Genomics projects have resulted in a flood of sequence data. Functional annotation currently relies almost exclusively on inter-species sequence comparison and is restricted in cases of limited data from related species and widely divergent sequences with no known homologs. Here, we demonstrate that codon composition, a fusion of codon usage bias and amino acid composition signals, can accurately discriminate, in the absence of sequence homology information, cytoplasmic ribosomal protein genes from all other genes of known function in Saccharomyces cerevisiae, Escherichia coli and Mycobacterium tuberculosis using an implementation of support vector machines, SVM(light). Analysis of these codon composition signals is instructive in determining features that confer individuality to ribosomal protein genes. Each of the sets of positively charged, negatively charged and small hydrophobic residues, as well as codon bias, contribute to their distinctive codon composition profile. The representation of all these signals is sensitively detected, combined and augmented by the SVMs to perform an accurate classification. Of special mention is an obvious outlier, yeast gene RPL22B, highly homologous to RPL22A but employing very different codon usage, perhaps indicating a non-ribosomal function. Finally, we propose that codon composition be used in combination with other attributes in gene/protein classification by supervised machine learning algorithms.

摘要

基因组学项目产生了大量的序列数据。目前,功能注释几乎完全依赖于种间序列比较,并且在来自相关物种的数据有限以及存在没有已知同源物的广泛分歧序列的情况下受到限制。在这里,我们证明,密码子组成,即密码子使用偏好和氨基酸组成信号的融合,可以在没有序列同源性信息的情况下,使用支持向量机(SVM(light))的实现方法,准确地区分酿酒酵母、大肠杆菌和结核分枝杆菌中已知功能的所有其他基因中的细胞质核糖体蛋白基因。对这些密码子组成信号的分析有助于确定赋予核糖体蛋白基因独特性的特征。带正电荷、带负电荷和小的疏水残基的每一组,以及密码子偏好,都有助于它们独特的密码子组成概况。支持向量机灵敏地检测、组合并增强所有这些信号的表示,以进行准确的分类。特别值得一提的是一个明显的异常值,酵母基因RPL22B,它与RPL22A高度同源,但使用非常不同的密码子使用方式,这可能表明它具有非核糖体功能。最后,我们建议在通过监督机器学习算法进行基因/蛋白质分类时,将密码子组成与其他属性结合使用。

相似文献

7
Codon usage in the Mycobacterium tuberculosis complex.结核分枝杆菌复合群中的密码子使用情况。
Microbiology (Reading). 1996 Apr;142 ( Pt 4):915-925. doi: 10.1099/00221287-142-4-915.

引用本文的文献

2
Evolution of codon usage in genomes and its impact on the host.基因组中密码子使用的演变及其对宿主的影响。
Front Vet Sci. 2023 Jan 11;9:1021440. doi: 10.3389/fvets.2022.1021440. eCollection 2022.

本文引用的文献

1
Whole-genome expression analysis: challenges beyond clustering.全基因组表达分析:聚类之外的挑战。
Curr Opin Struct Biol. 2001 Jun;11(3):340-7. doi: 10.1016/s0959-440x(00)00212-8.
2
Atomic structures at last: the ribosome in 2000.原子结构终现:2000年的核糖体
Curr Opin Struct Biol. 2001 Apr;11(2):144-54. doi: 10.1016/s0959-440x(00)00184-6.
3
The ribosome in focus.聚焦核糖体。
Cell. 2001 Mar 23;104(6):813-6. doi: 10.1016/s0092-8674(01)00278-1.
4
The ribosome at atomic resolution.原子分辨率下的核糖体。
Biochemistry. 2001 Mar 20;40(11):3243-50. doi: 10.1021/bi0029402.
5
A mouse phenome project.一个小鼠表型组计划。
Mamm Genome. 2000 Sep;11(9):715-7. doi: 10.1007/s003350010152.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验