Suppr超能文献

ACMGA:一种用于植物物种的无参考多基因组比对管道。

ACMGA: a reference-free multiple-genome alignment pipeline for plant species.

机构信息

College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, 266071, China.

National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong, 261325, China.

出版信息

BMC Genomics. 2024 May 25;25(1):515. doi: 10.1186/s12864-024-10430-y.

Abstract

BACKGROUND

The short-read whole-genome sequencing (WGS) approach has been widely applied to investigate the genomic variation in the natural populations of many plant species. With the rapid advancements in long-read sequencing and genome assembly technologies, high-quality genome sequences are available for a group of varieties for many plant species. These genome sequences are expected to help researchers comprehensively investigate any type of genomic variants that are missed by the WGS technology. However, multiple genome alignment (MGA) tools designed by the human genome research community might be unsuitable for plant genomes.

RESULTS

To fill this gap, we developed the AnchorWave-Cactus Multiple Genome Alignment (ACMGA) pipeline, which improved the alignment of repeat elements and could identify long (> 50 bp) deletions or insertions (INDELs). We conducted MGA using ACMGA and Cactus for 8 Arabidopsis (Arabidopsis thaliana) and 26 Maize (Zea mays) de novo assembled genome sequences and compared them with the previously published short-read variant calling results. MGA identified more single nucleotide variants (SNVs) and long INDELs than did previously published WGS variant callings. Additionally, ACMGA detected significantly more SNVs and long INDELs in repetitive regions and the whole genome than did Cactus. Compared with the results of Cactus, the results of ACMGA were more similar to the previously published variants called using short-read. These two MGA pipelines identified numerous multi-allelic variants that were missed by the WGS variant calling pipeline.

CONCLUSIONS

Aligning de novo assembled genome sequences could identify more SNVs and INDELs than mapping short-read. ACMGA combines the advantages of AnchorWave and Cactus and offers a practical solution for plant MGA by integrating global alignment, a 2-piece-affine-gap cost strategy, and the progressive MGA algorithm.

摘要

背景

短读全基因组测序(WGS)方法已广泛应用于研究许多植物物种的自然种群中的基因组变异。随着长读测序和基因组组装技术的快速发展,许多植物物种的多个品种都提供了高质量的基因组序列。这些基因组序列有望帮助研究人员全面研究 WGS 技术错过的任何类型的基因组变体。然而,人类基因组研究界设计的多基因组比对(MGA)工具可能不适合植物基因组。

结果

为了填补这一空白,我们开发了 AnchorWave-Cactus 多基因组比对(ACMGA)管道,该管道改进了重复元素的比对,并能够识别长(>50bp)缺失或插入(INDEL)。我们使用 ACMGA 和 Cactus 对 8 个拟南芥(Arabidopsis thaliana)和 26 个玉米(Zea mays)从头组装的基因组序列进行了 MGA,并将其与之前发表的短读变异调用结果进行了比较。MGA 比之前发表的 WGS 变体调用鉴定出更多的单核苷酸变体(SNVs)和长 INDEL。此外,ACMGA 比 Cactus 在重复区域和整个基因组中检测到更多的 SNVs 和长 INDEL。与 Cactus 的结果相比,ACMGA 的结果与之前使用短读发布的变体调用结果更相似。这两种 MGA 管道鉴定出了许多多等位基因变体,而这些变体被 WGS 变体调用管道忽略了。

结论

比对从头组装的基因组序列可以比映射短读鉴定出更多的 SNVs 和 INDEL。ACMGA 结合了 AnchorWave 和 Cactus 的优势,并通过整合全局比对、2 段仿射间隙成本策略和渐进式 MGA 算法,为植物 MGA 提供了一种实用的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3b1/11127342/9baebc9375ab/12864_2024_10430_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验