Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Nat Commun. 2021 May 24;12(1):3032. doi: 10.1038/s41467-021-23289-4.
Cellular genetic heterogeneity is common in many biological conditions including cancer, microbiome, and co-infection of multiple pathogens. Detecting and phasing minor variants play an instrumental role in deciphering cellular genetic heterogeneity, but they are still difficult tasks because of technological limitations. Recently, long-read sequencing technologies, including those by Pacific Biosciences and Oxford Nanopore, provide an opportunity to tackle these challenges. However, high error rates make it difficult to take full advantage of these technologies. To fill this gap, we introduce iGDA, an open-source tool that can accurately detect and phase minor single-nucleotide variants (SNVs), whose frequencies are as low as 0.2%, from raw long-read sequencing data. We also demonstrate that iGDA can accurately reconstruct haplotypes in closely related strains of the same species (divergence ≥0.011%) from long-read metagenomic data.
细胞遗传异质性在许多生物学条件下很常见,包括癌症、微生物组和多种病原体的合并感染。检测和定相微小变体在破译细胞遗传异质性方面起着重要作用,但由于技术限制,它们仍然是困难的任务。最近,长读测序技术,包括 Pacific Biosciences 和 Oxford Nanopore 的技术,为解决这些挑战提供了机会。然而,高错误率使得很难充分利用这些技术。为了填补这一空白,我们引入了 iGDA,这是一个开源工具,可以从原始的长读测序数据中准确地检测和定相频率低至 0.2%的微小单核苷酸变体 (SNV)。我们还证明,iGDA 可以从长读宏基因组数据中准确地重建同一物种亲缘关系密切的菌株的单倍型(分歧度≥0.011%)。