National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
University of Chinese Academy of Sciences, Beijing 100049, China.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad174.
Haplotype networks are graphs used to represent evolutionary relationships between a set of taxa and are characterized by intuitiveness in analyzing genealogical relationships of closely related genomes. We here propose a novel algorithm termed McAN that considers mutation spectrum history (mutations in ancestry haplotype should be contained in descendant haplotype), node size (corresponding to sample count for a given node) and sampling time when constructing haplotype network. We show that McAN is two orders of magnitude faster than state-of-the-art algorithms without losing accuracy, making it suitable for analysis of a large number of sequences. Based on our algorithm, we developed an online web server and offline tool for haplotype network construction, community lineage determination, and interactive network visualization. We demonstrate that McAN is highly suitable for analyzing and visualizing massive genomic data and is helpful to enhance the understanding of genome evolution. Availability: Source code is written in C/C++ and available at https://github.com/Theory-Lun/McAN and https://ngdc.cncb.ac.cn/biocode/tools/BT007301 under the MIT license. Web server is available at https://ngdc.cncb.ac.cn/bit/hapnet/. SARS-CoV-2 dataset are available at https://ngdc.cncb.ac.cn/ncov/. Contact: songshh@big.ac.cn (Song S), zhaowm@big.ac.cn (Zhao W), baoym@big.ac.cn (Bao Y), zhangzhang@big.ac.cn (Zhang Z), ybxue@big.ac.cn (Xue Y).
单体型网络是用于表示一组分类单元之间进化关系的图,其特点是在分析密切相关基因组的系统发育关系时具有直观性。我们在这里提出了一种新的算法,称为 McAN,该算法考虑了突变谱历史(祖先单体型中的突变应该包含在后代单体型中)、节点大小(对应于给定节点的样本计数)和构建单体型网络时的采样时间。我们表明,McAN 的速度比最先进的算法快两个数量级,而不会降低准确性,使其适合于大量序列的分析。基于我们的算法,我们开发了一个在线网络服务器和离线工具,用于单体型网络构建、社区谱系确定和交互式网络可视化。我们证明 McAN 非常适合分析和可视化大量基因组数据,有助于增强对基因组进化的理解。
源代码用 C/C++编写,可在 https://github.com/Theory-Lun/McAN 和 https://ngdc.cncb.ac.cn/biocode/tools/BT007301 获得,许可证为 MIT 许可证。网络服务器可在 https://ngdc.cncb.ac.cn/bit/hapnet/ 获得。SARS-CoV-2 数据集可在 https://ngdc.cncb.ac.cn/ncov/ 获得。
songshh@big.ac.cn(Song S)、zhaowm@big.ac.cn(Zhao W)、baoym@big.ac.cn(Bao Y)、zhangzhang@big.ac.cn(Zhang Z)、ybxue@big.ac.cn(Xue Y)。