Zhou Xin, Zhang Lu, Weng Ziming, Dill David L, Sidow Arend
Department of Computer Science, Stanford University, Stanford, CA, USA.
Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
Nat Commun. 2021 Feb 17;12(1):1077. doi: 10.1038/s41467-021-21395-x.
We introduce Aquila, a new approach to variant discovery in personal genomes, which is critical for uncovering the genetic contributions to health and disease. Aquila uses a reference sequence and linked-read data to generate a high quality diploid genome assembly, from which it then comprehensively detects and phases personal genetic variation. The contigs of the assemblies from our libraries cover >95% of the human reference genome, with over 98% of that in a diploid state. Thus, the assemblies support detection and accurate genotyping of the most prevalent types of human genetic variation, including single nucleotide polymorphisms (SNPs), small insertions and deletions (small indels), and structural variants (SVs), in all but the most difficult regions. All heterozygous variants are phased in blocks that can approach arm-level length. The final output of Aquila is a diploid and phased personal genome sequence, and a phased Variant Call Format (VCF) file that also contains homozygous and a few unphased heterozygous variants. Aquila represents a cost-effective approach that can be applied to cohorts for variation discovery or association studies, or to single individuals with rare phenotypes that could be caused by SVs or compound heterozygosity.
我们介绍了Aquila,这是一种用于个人基因组变异发现的新方法,对于揭示基因对健康和疾病的影响至关重要。Aquila利用参考序列和连接读段数据生成高质量的二倍体基因组组装,然后从中全面检测并分阶段分析个人遗传变异。我们文库组装的重叠群覆盖了超过95%的人类参考基因组,其中超过98%处于二倍体状态。因此,这些组装支持在除最困难区域外的所有区域检测和准确基因分型最常见的人类遗传变异类型,包括单核苷酸多态性(SNP)、小插入和缺失(小插入缺失)以及结构变异(SV)。所有杂合变异都被分阶段成可接近染色体臂水平长度的片段。Aquila的最终输出是一个二倍体和分阶段的个人基因组序列,以及一个分阶段的变异调用格式(VCF)文件,该文件还包含纯合变异和一些未分阶段的杂合变异。Aquila是一种经济高效的方法,可应用于群体进行变异发现或关联研究,或应用于具有可能由SV或复合杂合性引起的罕见表型的个体。