SNVMix：从肿瘤的下一代测序中预测单核苷酸变异。

SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.

机构信息

Department of Molecular Oncology Breast Cancer Research Program, British Columbia Cancer Research Centre, Vancouver, BC, Canada.

出版信息

Bioinformatics. 2010 Mar 15;26(6):730-6. doi: 10.1093/bioinformatics/btq040. Epub 2010 Feb 3.

DOI:10.1093/bioinformatics/btq040

PMID:20130035

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2832826/

Abstract

MOTIVATION

Next-generation sequencing (NGS) has enabled whole genome and transcriptome single nucleotide variant (SNV) discovery in cancer. NGS produces millions of short sequence reads that, once aligned to a reference genome sequence, can be interpreted for the presence of SNVs. Although tools exist for SNV discovery from NGS data, none are specifically suited to work with data from tumors, where altered ploidy and tumor cellularity impact the statistical expectations of SNV discovery.

RESULTS

We developed three implementations of a probabilistic Binomial mixture model, called SNVMix, designed to infer SNVs from NGS data from tumors to address this problem. The first models allelic counts as observations and infers SNVs and model parameters using an expectation maximization (EM) algorithm and is therefore capable of adjusting to deviation of allelic frequencies inherent in genomically unstable tumor genomes. The second models nucleotide and mapping qualities of the reads by probabilistically weighting the contribution of a read/nucleotide to the inference of a SNV based on the confidence we have in the base call and the read alignment. The third combines filtering out low-quality data in addition to probabilistic weighting of the qualities. We quantitatively evaluated these approaches on 16 ovarian cancer RNASeq datasets with matched genotyping arrays and a human breast cancer genome sequenced to >40x (haploid) coverage with ground truth data and show systematically that the SNVMix models outperform competing approaches.

AVAILABILITY

Software and data are available at http://compbio.bccrc.ca

CONTACT

sshah@bccrc.ca SUPPLEMANTARY INFORMATION: Supplementary data are available at Bioinformatics online.

摘要

动机

下一代测序（NGS）使癌症全基因组和转录组单核苷酸变异（SNV）的发现成为可能。NGS 产生了数百万条短序列读段，一旦与参考基因组序列对齐，就可以解释 SNV 的存在。虽然存在用于从 NGS 数据中发现 SNV 的工具，但没有专门针对肿瘤数据的工具，因为肿瘤中的倍性和肿瘤细胞含量会影响 SNV 发现的统计预期。

结果

我们开发了三种概率二项式混合模型的实现，称为 SNVMix，旨在解决这个问题，从肿瘤的 NGS 数据中推断 SNV。第一种模型将等位基因计数作为观测值，并使用期望最大化（EM）算法推断 SNV 和模型参数，因此能够调整基因组不稳定肿瘤基因组中固有的等位基因频率偏差。第二种模型通过根据我们对碱基调用和读取对齐的置信度，概率性地加权读取/核苷酸对 SNV 推断的贡献，来对读取的核苷酸和映射质量进行建模。第三种方法除了对质量进行概率加权外，还过滤掉低质量数据。我们在 16 个卵巢癌 RNAseq 数据集上对这些方法进行了定量评估，这些数据集具有匹配的基因分型阵列，以及一个人类乳腺癌基因组测序到 >40x（单倍体）覆盖度，具有真实数据，并系统地表明 SNVMix 模型优于竞争方法。

可用性

软件和数据可在 http://compbio.bccrc.ca 获得。

联系方式

sshah@bccrc.ca

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9340/2832826/db52fb7ffcd0/btq040f1.jpg

相似文献

SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.SNVMix：从肿瘤的下一代测序中预测单核苷酸变异。

Bioinformatics. 2010 Mar 15;26(6):730-6. doi: 10.1093/bioinformatics/btq040. Epub 2010 Feb 3.

SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing.SECEDO：基于 SNV 的亚克隆检测，使用超低覆盖度单细胞 DNA 测序。

Bioinformatics. 2022 Sep 15;38(18):4293-4300. doi: 10.1093/bioinformatics/btac510.

Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.通过将已知遗传变异纳入 minimap2 索引来提高全基因组测序数据中 SNV 的识别能力。

BMC Bioinformatics. 2024 Jul 13;25(1):238. doi: 10.1186/s12859-024-05862-y.

Mutation discovery in regions of segmental cancer genome amplifications with CoNAn-SNV: a mixture model for next generation sequencing of tumors.利用 CoNAn-SNV 对片段性癌症基因组扩增区域进行突变发现：一种用于肿瘤下一代测序的混合模型。

PLoS One. 2012;7(8):e41551. doi: 10.1371/journal.pone.0041551. Epub 2012 Aug 16.

JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data.联合 SNVMix：一种用于准确检测正常/肿瘤配对下一代测序数据中体细胞突变的概率模型。

Bioinformatics. 2012 Apr 1;28(7):907-13. doi: 10.1093/bioinformatics/bts053. Epub 2012 Jan 27.

SNVHMM: predicting single nucleotide variants from next generation sequencing.SNVHMM：从下一代测序中预测单核苷酸变异。

BMC Bioinformatics. 2013 Jul 15;14:225. doi: 10.1186/1471-2105-14-225.

Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data.基于特征的分类器用于肿瘤-正常配对测序数据中的体细胞突变检测。

Bioinformatics. 2012 Jan 15;28(2):167-75. doi: 10.1093/bioinformatics/btr629. Epub 2011 Nov 13.

Hidden Markov Models in Bioinformatics: SNV Inference from Next Generation Sequence.生物信息学中的隐马尔可夫模型：从下一代测序中推断单核苷酸变异

Methods Mol Biol. 2017;1552:123-133. doi: 10.1007/978-1-4939-6753-7_9.

Kart: a divide-and-conquer algorithm for NGS read alignment.Kart：一种用于二代测序读段比对的分治算法。

Bioinformatics. 2017 Aug 1;33(15):2281-2287. doi: 10.1093/bioinformatics/btx189.

BamView: visualizing and interpretation of next-generation sequencing read alignments.BamView：下一代测序读取比对的可视化和解释。

Brief Bioinform. 2013 Mar;14(2):203-12. doi: 10.1093/bib/bbr073. Epub 2012 Jan 16.

引用本文的文献

Noninvasive Prenatal Paternity Testing: A Review on Genetic Markers.无创产前亲子鉴定：遗传标记综述

Int J Mol Sci. 2025 May 9;26(10):4518. doi: 10.3390/ijms26104518.

Advancing Non-Invasive Prenatal Screening: A Targeted 1069-Gene Panel for Comprehensive Detection of Monogenic Disorders and Copy Number Variations.推进无创产前筛查：用于单基因疾病和拷贝数变异综合检测的靶向1069基因检测板

Genes (Basel). 2025 Apr 2;16(4):427. doi: 10.3390/genes16040427.

High-resolution mapping of Ryd4, a major resistance gene to Barley yellow dwarf virus from Hordeum bulbosum.高分辨率定位 Ryd4，大麦黄花叶病毒在 bulbosum 大麦中的主要抗性基因。

Theor Appl Genet. 2024 Feb 27;137(3):60. doi: 10.1007/s00122-024-04542-y.

The development of a custom RNA-sequencing panel for the identification of predictive and diagnostic biomarkers in glioma.开发用于鉴定神经胶质瘤预测性和诊断性生物标志物的定制 RNA 测序面板。

J Neurooncol. 2024 Mar;167(1):75-88. doi: 10.1007/s11060-024-04563-z. Epub 2024 Feb 16.

Allelic expression imbalance in articular cartilage and subchondral bone refined genome-wide association signals in osteoarthritis.关节软骨和软骨下骨中的等位基因表达失衡细化了骨关节炎的全基因组关联信号。

Rheumatology (Oxford). 2023 Apr 3;62(4):1669-1676. doi: 10.1093/rheumatology/keac498.

Noninvasive prenatal paternity testing by means of SNP-based targeted sequencing.基于 SNP 的靶向测序的无创性产前亲子鉴定。

Prenat Diagn. 2020 Mar;40(4):497-506. doi: 10.1002/pd.5595. Epub 2020 Feb 20.

Calling Variants in the Clinic: Informed Variant Calling Decisions Based on Biological, Clinical, and Laboratory Variables.临床中的变异检测：基于生物学、临床和实验室变量做出明智的变异检测决策

Comput Struct Biotechnol J. 2019 Apr 8;17:561-569. doi: 10.1016/j.csbj.2019.04.002. eCollection 2019.

Cell-level somatic mutation detection from single-cell RNA sequencing.单细胞 RNA 测序中单细胞体细胞突变检测

Bioinformatics. 2019 Nov 1;35(22):4679-4687. doi: 10.1093/bioinformatics/btz288.

Lineage tracing using a Cas9-deaminase barcoding system targeting endogenous L1 elements.利用靶向内源性 L1 元件的 Cas9 脱氨酶条码系统进行谱系追踪。

Nat Commun. 2019 Mar 15;10(1):1234. doi: 10.1038/s41467-019-09203-z.

Somatic mutation detection and classification through probabilistic integration of clonal population information.通过克隆群体信息的概率集成进行体细胞突变检测和分类。

Commun Biol. 2019 Jan 31;2:44. doi: 10.1038/s42003-019-0291-z. eCollection 2019.

本文引用的文献

Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution.在单核苷酸分辨率下分析的小叶型乳腺肿瘤中的突变进化。

Nature. 2009 Oct 8;461(7265):809-13. doi: 10.1038/nature08489.

Recurring mutations found by sequencing an acute myeloid leukemia genome.通过对急性髓系白血病基因组进行测序发现的复发性突变。

N Engl J Med. 2009 Sep 10;361(11):1058-66. doi: 10.1056/NEJMoa0903840. Epub 2009 Aug 5.

Mutation of FOXL2 in granulosa-cell tumors of the ovary.卵巢颗粒细胞瘤中FOXL2的突变

N Engl J Med. 2009 Jun 25;360(26):2719-29. doi: 10.1056/NEJMoa0902542. Epub 2009 Jun 10.

The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。

Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

SHRiMP: accurate mapping of short color-space reads.SHRiMP：短颜色空间读数的精确映射

PLoS Comput Biol. 2009 May;5(5):e1000386. doi: 10.1371/journal.pcbi.1000386. Epub 2009 May 22.

Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

The cancer genome.癌症基因组

Nature. 2009 Apr 9;458(7239):719-24. doi: 10.1038/nature07943.

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。

Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.

DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.细胞遗传学正常的急性髓系白血病基因组的DNA测序

Nature. 2008 Nov 6;456(7218):66-72. doi: 10.1038/nature07485.

Next-generation DNA sequencing.下一代DNA测序

Nat Biotechnol. 2008 Oct;26(10):1135-45. doi: 10.1038/nbt1486.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SNVMix：从肿瘤的下一代测序中预测单核苷酸变异。

SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献