Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.
Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.
Mol Biol Evol. 2021 May 19;38(6):2639-2659. doi: 10.1093/molbev/msab043.
Horizontal gene transfer (HGT) is central to prokaryotic evolution. However, little is known about the "scale" of individual HGT events. In this work, we introduce the first computational framework to help answer the following fundamental question: How often does more than one gene get horizontally transferred in a single HGT event? Our method, called HoMer, uses phylogenetic reconciliation to infer single-gene HGT events across a given set of species/strains, employs several techniques to account for inference error and uncertainty, combines that information with gene order information from extant genomes, and uses statistical analysis to identify candidate horizontal multigene transfers (HMGTs) in both extant and ancestral species/strains. HoMer is highly scalable and can be easily used to infer HMGTs across hundreds of genomes. We apply HoMer to a genome-scale data set of over 22,000 gene families from 103 Aeromonas genomes and identify a large number of plausible HMGTs of various scales at both small and large phylogenetic distances. Analysis of these HMGTs reveals interesting relationships between gene function, phylogenetic distance, and frequency of multigene transfer. Among other insights, we find that 1) the observed relative frequency of HMGT increases as divergence between genomes increases, 2) HMGTs often have conserved gene functions, and 3) rare genes are frequently acquired through HMGT. We also analyze in detail HMGTs involving the zonula occludens toxin and type III secretion systems. By enabling the systematic inference of HMGTs on a large scale, HoMer will facilitate a more accurate and more complete understanding of HGT and microbial evolution.
水平基因转移 (HGT) 是原核进化的核心。然而,人们对单个 HGT 事件的“规模”知之甚少。在这项工作中,我们引入了第一个计算框架来帮助回答以下基本问题:在单个 HGT 事件中,有多少个基因发生水平转移?我们的方法称为 HoMer,它使用系统发育协调来推断给定物种/菌株集中的单基因 HGT 事件,采用几种技术来解释推断错误和不确定性,将该信息与来自现存基因组的基因顺序信息相结合,并使用统计分析来识别现存和祖先物种/菌株中的候选水平多基因转移 (HMGT)。HoMer 具有高度可扩展性,可以轻松用于推断数百个基因组中的 HMGT。我们将 HoMer 应用于来自 103 个气单胞菌基因组的超过 22000 个基因家族的基因组规模数据集,并在小和大的系统发育距离上识别出大量各种规模的合理 HMGT。对这些 HMGT 的分析揭示了基因功能、系统发育距离和多基因转移频率之间的有趣关系。除其他见解外,我们发现:1)观察到的 HMGT 相对频率随着基因组之间的分歧增加而增加,2)HMGT 通常具有保守的基因功能,3)稀有基因经常通过 HMGT 获得。我们还详细分析了涉及 zonula occludens 毒素和 III 型分泌系统的 HMGT。通过在大规模上系统地推断 HMGT,HoMer 将促进对 HGT 和微生物进化的更准确和更完整的理解。