Chan Cheong Xin, Beiko Robert G, Ragan Mark A
Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.
Faculty of Computer Science, Dalhousie University, Halifax, NS, B3H 4R2, Canada.
Methods Mol Biol. 2017;1525:421-432. doi: 10.1007/978-1-4939-6622-6_16.
Lateral genetic transfer (LGT) is the process by which genetic material moves between organisms (and viruses) in the biosphere. Among the many approaches developed for the inference of LGT events from DNA sequence data, methods based on the comparison of phylogenetic trees remain the gold standard for many types of problem. Identifying LGT events from sequenced genomes typically involves a series of steps in which homologous sequences are identified and aligned, phylogenetic trees are inferred, and their topologies are compared to identify unexpected or conflicting relationships. These types of approach have been used to elucidate the nature and extent of LGT and its physiological and ecological consequences throughout the Tree of Life. Advances in DNA sequencing technology have led to enormous increases in the number of sequenced genomes, including ultra-deep sampling of specific taxonomic groups and single cell-based sequencing of unculturable "microbial dark matter." Environmental shotgun sequencing enables the study of LGT among organisms that share the same habitat.This abundance of genomic data offers new opportunities for scientific discovery, but poses two key problems. As ever more genomes are generated, the assembly and annotation of each individual genome receives less scrutiny; and with so many genomes available it is tempting to include them all in a single analysis, but thousands of genomes and millions of genes can overwhelm key algorithms in the analysis pipeline. Identifying LGT events of interest therefore depends on choosing the right dataset, and on algorithms that appropriately balance speed and accuracy given the size and composition of the chosen set of genomes.
横向基因转移(LGT)是遗传物质在生物圈中的生物体(和病毒)之间移动的过程。在为从DNA序列数据推断LGT事件而开发的众多方法中,基于系统发育树比较的方法仍然是许多类型问题的黄金标准。从已测序的基因组中识别LGT事件通常涉及一系列步骤,其中包括识别和比对同源序列、推断系统发育树,并比较它们的拓扑结构以识别意外或冲突的关系。这些方法已被用于阐明整个生命之树中LGT的性质和程度及其生理和生态后果。DNA测序技术的进步导致已测序基因组的数量大幅增加,包括对特定分类群的超深度采样和对不可培养的“微生物暗物质”进行基于单细胞的测序。环境鸟枪法测序能够研究共享同一栖息地的生物体之间的LGT。这种丰富的基因组数据为科学发现提供了新机会,但也带来了两个关键问题。随着生成的基因组越来越多,每个个体基因组的组装和注释受到的审查就越少;而且有如此多的基因组可供使用,人们很容易将它们全部纳入单一分析中,但数千个基因组和数百万个基因可能会使分析流程中的关键算法不堪重负。因此,识别感兴趣的LGT事件取决于选择合适的数据集,以及取决于能够根据所选基因组集的大小和组成适当平衡速度和准确性的算法。