Meinicke Peter, Brodag Thomas, Fricke Wolfgang Florian, Waack Stephan
Abteilung Bioinformatik, Institut für Mikrobiologie und Genetik, Georg-August-Universität Göttingen, Goldschmidtstr, 1, 37077 Göttingen, Germany.
Algorithms Mol Biol. 2006 Jun 29;1(1):10. doi: 10.1186/1748-7188-1-10.
Two important and not yet solved problems in bacterial genome research are the identification of horizontally transferred genes and the prediction of gene expression levels. Both problems can be addressed by multivariate analysis of codon usage data. In particular dimensionality reduction methods for visualization of multivariate data have shown to be effective tools for codon usage analysis. We here propose a multidimensional scaling approach using a novel similarity measure for codon usage tables. Our probabilistic similarity measure is based on P-values derived from the well-known chi-square test for comparison of two distributions. Experimental results on four microbial genomes indicate that the new method is well-suited for the analysis of horizontal gene transfer and translational selection. As compared with the widely-used correspondence analysis, our method did not suffer from outlier sensitivity and showed a better clustering of putative alien genes in most cases.
细菌基因组研究中有两个重要且尚未解决的问题,即水平转移基因的识别和基因表达水平的预测。这两个问题都可以通过对密码子使用数据进行多变量分析来解决。特别是用于多变量数据可视化的降维方法已被证明是密码子使用分析的有效工具。我们在此提出一种使用新型密码子使用表相似性度量的多维缩放方法。我们的概率相似性度量基于从用于比较两个分布的著名卡方检验得出的P值。对四个微生物基因组的实验结果表明,新方法非常适合水平基因转移分析和平移选择分析。与广泛使用的对应分析相比,我们的方法不受异常值敏感性的影响,并且在大多数情况下对推定的外来基因显示出更好的聚类效果。