• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DeepSNVMiner:一种用于检测细胞群体亚群中新兴罕见突变的序列分析工具。

DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations.

作者信息

Andrews T Daniel, Jeelall Yogesh, Talaulikar Dipti, Goodnow Christopher C, Field Matthew A

机构信息

Department of Immunology, John Curtin School of Medical Research, Australian National University, Canberra ACT, Australia; National Computational Infrastructure, Canberra ACT, Australia.

Department of Immunology, John Curtin School of Medical Research, Australian National University, Canberra ACT, Australia; School of Medicine and Pharmacology, University of Western Australia, Harry Perkins Institute, Perth, Australia.

出版信息

PeerJ. 2016 May 24;4:e2074. doi: 10.7717/peerj.2074. eCollection 2016.

DOI:10.7717/peerj.2074
PMID:27257550
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4888318/
Abstract

Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence variants that likely arose during DNA amplification. The workflow remains flexible such that it may be customised to variants of the data production protocol used, and supports reproducible analysis through detailed logging and reporting of results. DeepSNVMiner is available for academic non-commercial research purposes at https://github.com/mattmattmattmatt/DeepSNVMiner.

摘要

背景。大规模平行测序技术正被用于对高度多样化的DNA群体进行测序,比如来自包含野生型和疾病相关状态的异质细胞混合物的DNA。此类分子标记技术的核心是对源自单个输入DNA分子的序列读数进行标记和识别,这些读数必须首先通过计算进行解歧义处理,以生成共享共同序列标签的读数组,每个读数组代表一个单一的输入DNA分子。这种解歧义处理通常会产生大量的读数组,每个读数组都需要针对每个读数组运行特定的额外变异检测分析步骤,因此这是一项重大的计算挑战。虽然用于生成这些数据的测序技术已接近成熟,但缺乏用于分析此类异质序列数据的可用计算工具,这成为了该技术广泛应用的障碍。

结果。使用合成数据,我们成功地在百万分之一分子的稀释水平下检测到独特变异,并发现与流行的变异调用工具GATK、SAMTools、FreeBayes和LoFreq相比,DeeepSNVMiner获得的假阳性和假阴性率显著更低,尤其是随着变异浓度水平降低时。在使用来自两个细胞系的基因组DNA的稀释系列实验中,我们发现DeepSNVMiner在输入材料中仅以千分之一分子的浓度存在时就能识别出已知的体细胞变异,这是所有测试的变异调用工具中最低的浓度。

结论。在此我们展示了DeepSNVMiner;这是一种用于对标记的序列组进行解歧义处理并稳健识别特定于起始DNA分子子集的序列变异的工具,这些变异可能表明疾病的存在。DeepSNVMiner是一个由定制序列分析实用程序和开源工具组成的自动化工作流程,能够区分体细胞DNA变异与可能在DNA扩增过程中出现的人为序列变异。该工作流程保持灵活性,以便可以根据所使用的数据生产协议的变体进行定制,并通过详细的日志记录和结果报告支持可重复分析。DeepSNVMiner可用于学术非商业研究目的,网址为https://github.com/mattmattmattmatt/DeepSNVMiner。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/cbdd914d4d75/peerj-04-2074-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/f663d9688117/peerj-04-2074-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/e84cbfab1aef/peerj-04-2074-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/35f44a82678b/peerj-04-2074-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/62d6df817fcd/peerj-04-2074-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/cbdd914d4d75/peerj-04-2074-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/f663d9688117/peerj-04-2074-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/e84cbfab1aef/peerj-04-2074-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/35f44a82678b/peerj-04-2074-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/62d6df817fcd/peerj-04-2074-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c99f/4888318/cbdd914d4d75/peerj-04-2074-g005.jpg

相似文献

1
DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations.DeepSNVMiner:一种用于检测细胞群体亚群中新兴罕见突变的序列分析工具。
PeerJ. 2016 May 24;4:e2074. doi: 10.7717/peerj.2074. eCollection 2016.
2
Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data.评估低频变异调用工具在检测短读长深度测序数据中的变异方面的性能。
Sci Rep. 2023 Nov 22;13(1):20444. doi: 10.1038/s41598-023-47135-3.
3
Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.鸡中三种变异检测工具的比较以及从SNP芯片数据到全基因组序列水平的填充准确性评估。
BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.
4
UNDR ROVER - a fast and accurate variant caller for targeted DNA sequencing.UNDR ROVER——一种用于靶向DNA测序的快速且准确的变异检测工具。
BMC Bioinformatics. 2016 Apr 16;17:165. doi: 10.1186/s12859-016-1014-9.
5
SiNVICT: ultra-sensitive detection of single nucleotide variants and indels in circulating tumour DNA.SiNVICT:循环肿瘤 DNA 中单核苷酸变异和插入缺失的超灵敏检测。
Bioinformatics. 2017 Jan 1;33(1):26-34. doi: 10.1093/bioinformatics/btw536. Epub 2016 Aug 16.
6
Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data.Gencore:一种高效的工具,用于生成共识读数,以抑制 NGS 数据的错误并去除重复。
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):606. doi: 10.1186/s12859-019-3280-9.
7
Calling known variants and identifying new variants while rapidly aligning sequence data.在快速对齐序列数据的同时,调用已知变异体并识别新变异体。
J Dairy Sci. 2019 Apr;102(4):3216-3229. doi: 10.3168/jds.2018-15172. Epub 2019 Feb 14.
8
MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics.MutAid:基于桑格测序法和新一代测序技术的综合流程,用于人类分子遗传学中的突变鉴定、验证及注释
PLoS One. 2016 Feb 3;11(2):e0147697. doi: 10.1371/journal.pone.0147697. eCollection 2016.
9
Alignment-free clustering of UMI tagged DNA molecules.无比对聚类分析 UMI 标签化 DNA 分子。
Bioinformatics. 2019 Jun 1;35(11):1829-1836. doi: 10.1093/bioinformatics/bty888.
10
BRCA-analyzer: Automatic workflow for processing NGS reads of BRCA1 and BRCA2 genes.BRCA-analyzer:BRCA1 和 BRCA2 基因 NGS 读取的自动处理工作流程。
Comput Biol Chem. 2018 Dec;77:297-306. doi: 10.1016/j.compbiolchem.2018.10.012. Epub 2018 Oct 23.

引用本文的文献

1
Increasing pathogenic germline variant diagnosis rates in precision medicine: current best practices and future opportunities.提高精准医学中致病种系变异的诊断率:当前最佳实践与未来机遇
Hum Genomics. 2025 Aug 22;19(1):97. doi: 10.1186/s40246-025-00811-z.
2
Identifying genetic errors of immunity due to mosaicism.识别由嵌合体引起的免疫基因错误。
J Exp Med. 2025 May 5;222(5). doi: 10.1084/jem.20241045. Epub 2025 Apr 15.
3
Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection.基于 UMIs 的低频 ctDNA 变异检测与标准变异 caller 的基准测试

本文引用的文献

1
Reliably Detecting Clinically Important Variants Requires Both Combined Variant Calls and Optimized Filtering Strategies.可靠地检测具有临床重要性的变异既需要联合变异调用,也需要优化的过滤策略。
PLoS One. 2015 Nov 23;10(11):e0143199. doi: 10.1371/journal.pone.0143199. eCollection 2015.
2
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.深入了解Illumina MiSeq平台进行扩增子测序时的偏差和测序错误。
Nucleic Acids Res. 2015 Mar 31;43(6):e37. doi: 10.1093/nar/gku1341. Epub 2015 Jan 13.
3
High-throughput profiling of point mutations across the HIV-1 genome.
BMC Genomics. 2024 Sep 3;25(1):827. doi: 10.1186/s12864-024-10737-w.
4
Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data.评估低频变异调用工具在检测短读长深度测序数据中的变异方面的性能。
Sci Rep. 2023 Nov 22;13(1):20444. doi: 10.1038/s41598-023-47135-3.
5
Detection of minor variants in Mycobacterium tuberculosis whole genome sequencing data.结核分枝杆菌全基因组测序数据中小变异的检测。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab541.
6
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing.使用全基因组测序检测孟德尔疾病中的因果变异。
Methods Mol Biol. 2021;2243:1-25. doi: 10.1007/978-1-0716-1103-6_1.
7
UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries.UMI-Gen:一种基于单分子唯一分子标识符(UMI)的读段模拟器,用于双端测序NGS文库中的变异检测评估。
Comput Struct Biotechnol J. 2020 Aug 27;18:2270-2280. doi: 10.1016/j.csbj.2020.08.011. eCollection 2020.
8
Detecting pathogenic variants in autoimmune diseases using high-throughput sequencing.使用高通量测序检测自身免疫性疾病中的致病变异体。
Immunol Cell Biol. 2021 Feb;99(2):146-156. doi: 10.1111/imcb.12372. Epub 2020 Jul 27.
9
Calling Variants in the Clinic: Informed Variant Calling Decisions Based on Biological, Clinical, and Laboratory Variables.临床中的变异检测:基于生物学、临床和实验室变量做出明智的变异检测决策
Comput Struct Biotechnol J. 2019 Apr 8;17:561-569. doi: 10.1016/j.csbj.2019.04.002. eCollection 2019.
10
smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers.smCounter2:一种带有独特分子标识符的靶向测序数据的精确低频变异调用器。
Bioinformatics. 2019 Apr 15;35(8):1299-1309. doi: 10.1093/bioinformatics/bty790.
对HIV-1全基因组点突变进行高通量分析。
Retrovirology. 2014 Dec 19;11:124. doi: 10.1186/s12977-014-0124-6.
4
SAMBLASTER: fast duplicate marking and structural variant read extraction.SAMBLASTER:快速重复标记和结构变异读段提取。
Bioinformatics. 2014 Sep 1;30(17):2503-5. doi: 10.1093/bioinformatics/btu314. Epub 2014 May 7.
5
The promise and challenge of high-throughput sequencing of the antibody repertoire.高通量测序抗体库的前景与挑战。
Nat Biotechnol. 2014 Feb;32(2):158-68. doi: 10.1038/nbt.2782. Epub 2014 Jan 19.
6
Going with the flow: from circulating tumor cells to DNA.随波逐流:从循环肿瘤细胞到 DNA。
Sci Transl Med. 2013 Oct 16;5(207):207ps14. doi: 10.1126/scitranslmed.3006305.
7
Characterizing and measuring bias in sequence data.表征和测量序列数据中的偏差。
Genome Biol. 2013 May 29;14(5):R51. doi: 10.1186/gb-2013-14-5-r51.
8
Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation.单分子分子反转探针,用于靶向、高精度检测低频变化。
Genome Res. 2013 May;23(5):843-54. doi: 10.1101/gr.147686.112. Epub 2013 Feb 4.
9
LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets.LoFreq:一种序列质量感知的超灵敏变异 caller,可从高通量测序数据集中揭示细胞群体异质性。
Nucleic Acids Res. 2012 Dec;40(22):11189-201. doi: 10.1093/nar/gks918. Epub 2012 Oct 12.
10
Detection of ultra-rare mutations by next-generation sequencing.通过下一代测序检测超罕见突变。
Proc Natl Acad Sci U S A. 2012 Sep 4;109(36):14508-13. doi: 10.1073/pnas.1208715109. Epub 2012 Aug 1.