Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, Nova Scotia B3 H 1W5, Canada.
BMC Evol Biol. 2010 Nov 20;10:360. doi: 10.1186/1471-2148-10-360.
Microbial genomes exhibit complex sets of genetic affinities due to lateral genetic transfer. Assessing the relative contributions of parent-to-offspring inheritance and gene sharing is a vital step in understanding the evolutionary origins and modern-day function of an organism, but recovering and showing these relationships is a challenging problem.
We have developed a new approach that uses linear programming to find between-genome relationships, by treating tables of genetic affinities (here, represented by transformed BLAST e-values) as an optimization problem. Validation trials on simulated data demonstrate the effectiveness of the approach in recovering and representing vertical and lateral relationships among genomes. Application of the technique to a set comprising Aquifex aeolicus and 75 other thermophiles showed an important role for large genomes as 'hubs' in the gene sharing network, and suggested that genes are preferentially shared between organisms with similar optimal growth temperatures. We were also able to discover distinct and common genetic contributors to each sequenced representative of genus Pseudomonas.
The linear programming approach we have developed can serve as an effective inference tool in its own right, and can be an efficient first step in a more-intensive phylogenomic analysis.
由于水平基因转移,微生物基因组表现出复杂的遗传关系。评估亲代到子代的遗传和基因共享的相对贡献,是理解生物体进化起源和现代功能的重要步骤,但恢复和展示这些关系是一个具有挑战性的问题。
我们开发了一种新方法,通过将遗传关系表(此处表示为经过转换的 BLAST e 值)视为优化问题,使用线性规划来寻找基因组之间的关系。对模拟数据的验证试验证明了该方法在恢复和表示基因组之间垂直和水平关系方面的有效性。该技术在包含水生栖热菌和 75 种其他嗜热菌的一组数据上的应用表明,大型基因组作为基因共享网络中的“枢纽”起着重要作用,并且表明基因在最佳生长温度相似的生物体之间更倾向于共享。我们还能够发现每个假单胞菌属测序代表的独特和共同的遗传贡献者。
我们开发的线性规划方法本身可以作为一种有效的推断工具,并且可以作为更密集的系统发育分析的有效第一步。