Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Road, Baltimore, MD 21228, USA.
DNA Res. 2010 Jun;17(3):185-96. doi: 10.1093/dnares/dsq012. Epub 2010 May 7.
The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.
由于密码子偏性指数(CBI)在计算生物学中有广泛的应用,因此其发展仍然是一个活跃的研究领域。最近,相对密码子使用偏性(RCBS)被引入作为一种新的 CBI,能够在不使用参考集的情况下估计密码子偏性。当将这个新指数应用于大肠杆菌和酿酒酵母时,原始出版物的作者得出结论,自然选择有利于短基因的更高表达和增强的密码子使用优化。在这里,我们表明这个结论是有缺陷的,并且是基于 RCBS 指数中短序列固有的偏差以及大肠杆菌中用于验证的小数据集的偏差的系统疏忽。此外,我们揭示了如何纠正 RCBS 以产生有用的结果,以及如何将其基本原理(我们在此称为相对密码子适应度(RCA))转化为一种强大的基于参考集的指数,该指数直接考虑基因组碱基组成。最后,我们表明,当在 CAI 参考集上运行时,RCA 作为基因表达的预测因子优于密码子适应度指数(CAI),并且当分析具有高突变偏性的基因组时,这种改进要大得多。