Suppr超能文献

公共癌症基因组学数据集中标注错误的多核苷酸变体导致不准确的突变调用,具有重要影响。

Misannotated Multi-Nucleotide Variants in Public Cancer Genomics Datasets Lead to Inaccurate Mutation Calls with Significant Implications.

机构信息

Informatics and Predictive Sciences, Bristol Myers Squibb, Princeton, New Jersey.

Sentieon Inc., Mountain View, California.

出版信息

Cancer Res. 2021 Jan 15;81(2):282-288. doi: 10.1158/0008-5472.CAN-20-2151. Epub 2020 Oct 28.

Abstract

Although next-generation sequencing is widely used in cancer to profile tumors and detect variants, most somatic variant callers used in these pipelines identify variants at the lowest possible granularity, single-nucleotide variants (SNV). As a result, multiple adjacent SNVs are called individually instead of as a multi-nucleotide variants (MNV). With this approach, the amino acid change from the individual SNV within a codon could be different from the amino acid change based on the MNV that results from combining SNV, leading to incorrect conclusions about the downstream effects of the variants. Here, we analyzed 10,383 variant call files (VCF) from the Cancer Genome Atlas (TCGA) and found 12,141 incorrectly annotated MNVs. Analysis of seven commonly mutated genes from 178 studies in cBioPortal revealed that MNVs were consistently missed in 20 of these studies, whereas they were correctly annotated in 15 more recent studies. At the V600 locus, the most common example of MNV, several public datasets reported separate V600E and V600M variants instead of a single merged V600K variant. VCFs from the TCGA Mutect2 caller were used to develop a solution to merge SNV to MNV. Our custom script used the phasing information from the SNV VCF and determined whether SNVs were at the same codon and needed to be merged into MNV before variant annotation. This study shows that institutions performing NGS sequencing for cancer genomics should incorporate the step of merging MNV as a best practice in their pipelines. SIGNIFICANCE: Identification of incorrect mutation calls in TCGA, including clinically relevant V600 and G12, will influence research and potentially clinical decisions.

摘要

虽然下一代测序技术在癌症中被广泛用于对肿瘤进行分析和检测变体,但这些管道中使用的大多数体细胞变异呼叫器仅以最低的粒度(单核苷酸变体 (SNV))识别变体。结果,多个相邻的 SNV 被单独调用,而不是作为多核苷酸变体 (MNV)。使用这种方法,单个密码子内的 SNV 引起的氨基酸变化可能与组合 SNV 导致的 MNV 引起的氨基酸变化不同,从而导致对变体下游效应的错误结论。在这里,我们分析了来自癌症基因组图谱 (TCGA) 的 10,383 个变体调用文件 (VCF),并发现了 12,141 个错误注释的 MNV。对 cBioPortal 中 178 项研究的七个常见突变基因的分析表明,20 项研究始终忽略了 MNV,而在 15 项最近的研究中则正确注释了 MNV。在 V600 基因座,最常见的 MNV 例子,几个公共数据集分别报告了 V600E 和 V600M 变体,而不是单个合并的 V600K 变体。使用 TCGA Mutect2 调用器的 VCF 开发了一种将 SNV 合并为 MNV 的解决方案。我们的自定义脚本使用 SNV VCF 的相位信息,并确定 SNV 是否在同一密码子中,并在进行变异注释之前需要合并为 MNV。本研究表明,从事癌症基因组学 NGS 测序的机构应将合并 MNV 作为其管道中的最佳实践步骤。意义:在 TCGA 中识别不正确的突变调用,包括临床相关的 V600 和 G12,将影响研究并可能影响临床决策。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验