Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA.
BMC Genomics. 2012 Jun 15;13:245. doi: 10.1186/1471-2164-13-245.
Understanding the history of life requires that we understand the transfer of genetic material across phylogenetic boundaries. Detecting genes that were acquired by means other than vertical descent is a basic step in that process. Detection by discordant phylogenies is computationally expensive and not always definitive. Many have used easily computed compositional features as an alternative procedure. However, different compositional methods produce different predictions, and the effectiveness of any method is not well established.
The ability of octamer frequency comparisons to detect genes artificially seeded in cyanobacterial genomes was markedly increased by using as a training set those genes that are highly conserved over all bacteria. Using a subset of octamer frequencies in such tests also increased effectiveness, but this depended on the specific target genome and the source of the contaminating genes. The presence of high frequency octamers and the GC content of the contaminating genes were important considerations. A method comprising best practices from these tests was devised, the Core Gene Similarity (CGS) method, and it performed better than simple octamer frequency analysis, codon bias, or GC contrasts in detecting seeded genes or naturally occurring transposons. From a comparison of predictions with phylogenetic trees, it appears that the effectiveness of the method is confined to horizontal transfer events that have occurred recently in evolutionary time.
The CGS method may be an improvement over existing surrogate methods to detect genes of foreign origin.
理解生命的历史需要我们理解遗传物质在系统发育边界上的转移。检测通过垂直进化以外的方式获得的基因是该过程的基本步骤。通过不匹配的系统发育进行检测在计算上很昂贵,并且并不总是确定的。许多人已经使用易于计算的组成特征作为替代程序。然而,不同的组成方法会产生不同的预测,并且任何方法的有效性都没有得到很好的确立。
通过使用在所有细菌中高度保守的那些基因作为训练集,八聚体频率比较检测在蓝藻基因组中人工播种基因的能力显著提高。在这种测试中使用八聚体频率的子集也提高了有效性,但这取决于特定的目标基因组和污染基因的来源。高频率八聚体的存在和污染基因的 GC 含量是重要的考虑因素。从这些测试中设计了一种最佳实践的方法,即核心基因相似性(CGS)方法,它在检测播种基因或自然发生的转座子时,比简单的八聚体频率分析、密码子偏倚或 GC 对比表现更好。从与系统发育树的预测比较来看,该方法的有效性似乎仅限于在进化时间上最近发生的水平转移事件。
CGS 方法可能是检测外来起源基因的现有替代方法的改进。