Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain. Electronic address: https://twitter.com/@SantusLuisa.
Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain.
Curr Opin Struct Biol. 2023 Jun;80:102577. doi: 10.1016/j.sbi.2023.102577. Epub 2023 Apr 1.
Large-scale genomics requires highly scalable and accurate multiple sequence alignment methods. Results collected over this last decade suggest accuracy loss when scaling up over a few thousand sequences. This issue has been actively addressed with a number of innovative algorithmic solutions that combine low-level hardware optimization with novel higher-level heuristics. This review provides an extensive critical overview of these recent methods. Using established reference datasets we conclude that albeit significant progress has been achieved, a unified framework able to consistently and efficiently produce high-accuracy large-scale multiple alignments is still lacking.
大规模基因组学需要高度可扩展和精确的多序列比对方法。过去十年的研究结果表明,当扩展到几千个序列以上时,准确性会下降。这个问题已经通过许多创新的算法解决方案得到了积极的解决,这些方案将底层硬件优化与新颖的高层启发式算法相结合。本综述对这些最新方法进行了广泛的批判性综述。使用既定的参考数据集,我们得出结论,尽管已经取得了重大进展,但仍然缺乏一个能够始终如一地高效生成高精度大规模多序列比对的统一框架。