从长读测序数据中检测和相位单核苷酸变体。

Detecting and phasing minor single-nucleotide variants from long-read sequencing data.

机构信息

Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

Nat Commun. 2021 May 24;12(1):3032. doi: 10.1038/s41467-021-23289-4.

DOI:10.1038/s41467-021-23289-4

PMID:34031367

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8144375/

Abstract

Cellular genetic heterogeneity is common in many biological conditions including cancer, microbiome, and co-infection of multiple pathogens. Detecting and phasing minor variants play an instrumental role in deciphering cellular genetic heterogeneity, but they are still difficult tasks because of technological limitations. Recently, long-read sequencing technologies, including those by Pacific Biosciences and Oxford Nanopore, provide an opportunity to tackle these challenges. However, high error rates make it difficult to take full advantage of these technologies. To fill this gap, we introduce iGDA, an open-source tool that can accurately detect and phase minor single-nucleotide variants (SNVs), whose frequencies are as low as 0.2%, from raw long-read sequencing data. We also demonstrate that iGDA can accurately reconstruct haplotypes in closely related strains of the same species (divergence ≥0.011%) from long-read metagenomic data.

摘要

细胞遗传异质性在许多生物学条件下很常见，包括癌症、微生物组和多种病原体的合并感染。检测和定相微小变体在破译细胞遗传异质性方面起着重要作用，但由于技术限制，它们仍然是困难的任务。最近，长读测序技术，包括 Pacific Biosciences 和 Oxford Nanopore 的技术，为解决这些挑战提供了机会。然而，高错误率使得很难充分利用这些技术。为了填补这一空白，我们引入了 iGDA，这是一个开源工具，可以从原始的长读测序数据中准确地检测和定相频率低至 0.2%的微小单核苷酸变体 (SNV)。我们还证明，iGDA 可以从长读宏基因组数据中准确地重建同一物种亲缘关系密切的菌株的单倍型（分歧度≥0.011%）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef4a/8144375/19b0de7e0d8f/41467_2021_23289_Fig1_HTML.jpg

相似文献

Detecting and phasing minor single-nucleotide variants from long-read sequencing data.从长读测序数据中检测和相位单核苷酸变体。

Nat Commun. 2021 May 24;12(1):3032. doi: 10.1038/s41467-021-23289-4.

Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.Longshot 可通过单分子长读测序对二倍体基因组进行准确的变异调用。

Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.

Lost in plasmids: next generation sequencing and the complex genome of the tick-borne pathogen Borrelia burgdorferi.迷失在质粒中：新一代测序与蜱传病原体伯氏疏螺旋体的复杂基因组

BMC Genomics. 2017 May 30;18(1):422. doi: 10.1186/s12864-017-3804-5.

Physical separation of haplotypes in dikaryons allows benchmarking of phasing accuracy in Nanopore and HiFi assemblies with Hi-C data.双核体中单倍型的物理分离允许使用 Hi-C 数据对 Nanopore 和 HiFi 组装的相位准确性进行基准测试。

Genome Biol. 2022 Mar 25;23(1):84. doi: 10.1186/s13059-022-02658-2.

MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.环境宏基因组的MinION™纳米孔测序：一种合成方法。

Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.

Megabase-scale methylation phasing using nanopore long reads and NanoMethPhase.使用纳米孔长读和 NanoMethPhase 进行兆碱基规模的甲基化相分析。

Genome Biol. 2021 Feb 22;22(1):68. doi: 10.1186/s13059-021-02283-5.

Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads.基于连锁reads 的单体型辅助二倍体组装和变异检测。

Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11.

Phasing DNA Methylation.DNA 甲基化的分相

Methods Mol Biol. 2023;2590:219-235. doi: 10.1007/978-1-0716-2819-5_14.

Adaptation of Oxford Nanopore technology for hepatitis C whole genome sequencing and identification of within-host viral variants.牛津纳米孔技术在丙型肝炎全基因组测序及宿主内病毒变异体鉴定中的应用。

BMC Genomics. 2021 Mar 2;22(1):148. doi: 10.1186/s12864-021-07460-1.

NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data.NanoMod：一种使用纳米孔长读测序数据检测 DNA 修饰的计算工具。

BMC Genomics. 2019 Feb 4;20(Suppl 1):78. doi: 10.1186/s12864-018-5372-8.

引用本文的文献

Genetic analysis using long-read sequencing to overcome the difficulties in gene.使用长读长测序进行基因分析以克服基因研究中的困难。

Res Pract Thromb Haemost. 2025 May 17;9(4):102888. doi: 10.1016/j.rpth.2025.102888. eCollection 2025 May.

Bioinformatic approaches to blood and tissue microbiome analyses: challenges and perspectives.血液和组织微生物组分析的生物信息学方法：挑战与展望。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf176.

A High-Throughput Screening Strategy for Producing Menaquinone-7 Based on Fluorescence-Activated Cell Sorting.一种基于荧光激活细胞分选技术生产甲萘醌-7的高通量筛选策略。

Microorganisms. 2025 Feb 27;13(3):536. doi: 10.3390/microorganisms13030536.

The promising role of nanopore sequencing in cancer diagnostics and treatment.纳米孔测序在癌症诊断和治疗中的潜在作用。

Cell Insight. 2025 Jan 18;4(2):100229. doi: 10.1016/j.cellin.2025.100229. eCollection 2025 Apr.

Learning From Full Characterization of HIV Proviruses in People Receiving Long-Acting Cabotegravir/Rilpivirine With a History of Replication on the Antiretroviral Classes.从接受长效卡博特韦/利匹韦林且有抗逆转录病毒药物治疗史的人群中对HIV前病毒进行全面表征中学习。

Open Forum Infect Dis. 2024 Dec 24;12(1):ofae748. doi: 10.1093/ofid/ofae748. eCollection 2025 Jan.

Long-read RNA sequencing: A transformative technology for exploring transcriptome complexity in human diseases.长读长RNA测序：一种探索人类疾病转录组复杂性的变革性技术。

Mol Ther. 2025 Mar 5;33(3):883-894. doi: 10.1016/j.ymthe.2024.11.025. Epub 2024 Nov 19.

pan-Draft: automated reconstruction of species-representative metabolic models from multiple genomes.泛基因组：从多个基因组中自动重建具有代表性的物种代谢模型。

Genome Biol. 2024 Oct 25;25(1):280. doi: 10.1186/s13059-024-03425-1.

Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing.Strainy：从长读宏基因组测序中对菌株单倍型进行相位和组装。

Nat Methods. 2024 Nov;21(11):2034-2043. doi: 10.1038/s41592-024-02424-1. Epub 2024 Sep 26.

Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline.简化基因组监测：针对 HIV-1 和其他病原性病毒的多菌株混合数据，对长读长组装器进行全面性能评估，以构建用户友好的生物信息学管道。

F1000Res. 2024 May 31;13:556. doi: 10.12688/f1000research.149577.1. eCollection 2024.

Homozygous, Intragenic Tandem Duplication of Causes Neonatal Respiratory Failure.纯合子、基因内串联重复导致新生儿呼吸衰竭。

Am J Respir Cell Mol Biol. 2024 Jan;70(1):78-80. doi: 10.1165/rcmb.2023-0156LE.

本文引用的文献

Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.Longshot 可通过单分子长读测序对二倍体基因组进行准确的变异调用。

Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.

Assembly of long, error-prone reads using repeat graphs.使用重复图组装长的、易错的读取。

Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1.

fastp: an ultra-fast all-in-one FASTQ preprocessor.fastp：一个超快速的一体化 FASTQ 预处理程序。

Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560.

Minimap2: pairwise alignment for nucleotide sequences.Minimap2：核苷酸序列的两两比对。

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Progressive approach for SNP calling and haplotype assembly using single molecular sequencing data.使用单分子测序数据进行 SNP 调用和单倍型组装的渐进方法。

Bioinformatics. 2018 Jun 15;34(12):2012-2018. doi: 10.1093/bioinformatics/bty059.

Detecting DNA cytosine methylation using nanopore sequencing.利用纳米孔测序检测 DNA 胞嘧啶甲基化。

Nat Methods. 2017 Apr;14(4):407-410. doi: 10.1038/nmeth.4184. Epub 2017 Feb 20.

A hybrid approach for de novo human genome sequence assembly and phasing.一种用于从头进行人类基因组序列组装和定相的混合方法。

Nat Methods. 2016 Jul;13(7):587-90. doi: 10.1038/nmeth.3865. Epub 2016 May 9.

HIV Haplotype Inference Using a Propagating Dirichlet Process Mixture Model.使用传播狄利克雷过程混合模型进行HIV单倍型推断

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):182-91. doi: 10.1109/TCBB.2013.145.

ConStrains identifies microbial strains in metagenomic datasets.ConStrains可识别宏基因组数据集中的微生物菌株。

Nat Biotechnol. 2015 Oct;33(10):1045-52. doi: 10.1038/nbt.3319. Epub 2015 Sep 7.

REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.REBASE——一个关于DNA限制与修饰的数据库：酶、基因与基因组。

Nucleic Acids Res. 2015 Jan;43(Database issue):D298-9. doi: 10.1093/nar/gku1046. Epub 2014 Nov 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从长读测序数据中检测和相位单核苷酸变体。

Detecting and phasing minor single-nucleotide variants from long-read sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献