Department of Biomedical Sciences and Veterinary Public Health, Section of Virology, Swedish University of Agricultural Sciences, Uppsala, Sweden.
Virulence. 2013 Jan 1;4(1):97-106. doi: 10.4161/viru.23161.
Due to a significant decrease in the cost of DNA sequencing, the number of sequences submitted to the public databases has dramatically increased in recent years. Efficient analysis of these data sets may lead to a significant understanding of the nature of pathogens such as bacteria, viruses, parasites, etc. However, this has raised questions about the efficacy of currently available algorithms for the study of pathogen evolution and construction of phylogenetic trees. While the advanced algorithms and corresponding programs are being developed, it is crucial to optimize the available ones in order to cope with the current need. The protocol presented in this study is optimized using a number of strategies currently being proposed for handling large-scale DNA sequence data sets, and offers a highly efficacious and accurate method for computing phylogenetic trees with limited computer resources. The protocol may take up to 36 h for construction and annotation of a final tree of about 20,000 sequences.
由于 DNA 测序成本的大幅降低,近年来提交到公共数据库的序列数量急剧增加。对这些数据集进行有效的分析可能会使人们对细菌、病毒、寄生虫等病原体的本质有更深入的了解。然而,这也引发了对现有算法在病原体进化研究和系统发育树构建方面的功效的质疑。虽然正在开发更先进的算法和相应的程序,但优化现有的算法以应对当前的需求至关重要。本研究提出的方案使用了目前提出的一些策略来处理大规模的 DNA 序列数据集,并且为在有限的计算机资源下计算系统发育树提供了一种高效、准确的方法。该方案构建和注释一个大约 20000 个序列的最终树可能需要长达 36 小时。