基于下一代测序的多样本比较中定义负突变状态的自适应方法。

An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing.

机构信息

Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.

PET/CT Center, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology, Hefei, 230001, China.

出版信息

BMC Med Genomics. 2021 Dec 2;14(Suppl 2):32. doi: 10.1186/s12920-021-00880-8.

DOI:10.1186/s12920-021-00880-8

PMID:34856988

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8638096/

Abstract

BACKGROUND

Multi-sample comparison is commonly used in cancer genomics studies. By using next-generation sequencing (NGS), a mutation's status in a specific sample can be measured by the number of reads supporting mutant or wildtype alleles. When no mutant reads are detected, it could represent either a true negative mutation status or a false negative due to an insufficient number of reads, so-called "coverage". To minimize the chance of false-negative, we should consider the mutation status as "unknown" instead of "negative" when the coverage is inadequately low. There is no established method for determining the coverage threshold between negative and unknown statuses. A common solution is to apply a universal minimum coverage (UMC). However, this method relies on an arbitrarily chosen threshold, and it does not take into account the mutations' relative abundances, which can vary dramatically by the type of mutations. The result could be misclassification between negative and unknown statuses.

METHODS

We propose an adaptive mutation-specific negative (MSN) method to improve the discrimination between negative and unknown mutation statuses. For a specific mutation, a non-positive sample is compared with every known positive sample to test the null hypothesis that they may contain the same frequency of mutant reads. The non-positive sample can only be claimed as "negative" when this null hypothesis is rejected with all known positive samples; otherwise, the status would be "unknown".

RESULTS

We first compared the performance of MSN and UMC methods in a simulated dataset containing varying tumor cell fractions. Only the MSN methods appropriately assigned negative statuses for samples with both high- and low-tumor cell fractions. When evaluated on a real dual-platform single-cell sequencing dataset, the MSN method not only provided more accurate assessments of negative statuses but also yielded three times more available data after excluding the "unknown" statuses, compared with the UMC method.

CONCLUSIONS

We developed a new adaptive method for distinguishing unknown from negative statuses in multi-sample comparison NGS data. The method can provide more accurate negative statuses than the conventional UMC method and generate a remarkably higher amount of available data by reducing unnecessary "unknown" calls.

摘要

背景

多样本比较在癌症基因组学研究中经常使用。通过使用下一代测序（NGS），可以通过支持突变或野生型等位基因的读取数来测量特定样本中突变的状态。当没有检测到突变读取时，它可能代表真正的阴性突变状态，也可能由于读取数不足而出现假阴性，即所谓的“覆盖度”。为了最大程度地减少假阴性的机会，当覆盖度不足够低时，我们应该将突变状态视为“未知”，而不是“阴性”。目前还没有确定用于确定阴性和未知状态之间覆盖度阈值的既定方法。一种常见的解决方案是应用通用最小覆盖度（UMC）。但是，这种方法依赖于任意选择的阈值，并且没有考虑到突变的相对丰度，这些丰度可能因突变类型的不同而有很大差异。结果可能导致阴性和未知状态之间的分类错误。

方法

我们提出了一种自适应的突变特异性阴性（MSN）方法，以改善阴性和未知突变状态之间的区分。对于特定的突变，将非阳性样本与每个已知的阳性样本进行比较，以检验它们可能包含相同频率的突变读取的零假设。只有当该零假设被所有已知的阳性样本拒绝时，非阳性样本才能被断言为“阴性”；否则，状态将为“未知”。

结果

我们首先在包含不同肿瘤细胞分数的模拟数据集中比较了 MSN 和 UMC 方法的性能。只有 MSN 方法适当地将高肿瘤细胞分数和低肿瘤细胞分数的样本分配为阴性状态。在真实的双平台单细胞测序数据集上进行评估时，MSN 方法不仅提供了更准确的阴性状态评估，而且与 UMC 方法相比，在排除“未知”状态后，还产生了三倍以上的可用数据。

结论

我们开发了一种用于区分多样本比较 NGS 数据中未知和阴性状态的新自适应方法。与传统的 UMC 方法相比，该方法可以提供更准确的阴性状态，并通过减少不必要的“未知”调用生成显著更多的可用数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee32/8638096/dce7f27e500d/12920_2021_880_Fig1_HTML.jpg

相似文献

An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing.

BMC Med Genomics. 2021 Dec 2;14(Suppl 2):32. doi: 10.1186/s12920-021-00880-8.

Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data.

Bioinformatics. 2017 Apr 15;33(8):1147-1153. doi: 10.1093/bioinformatics/btw786.

[Comparison of different massive parallel sequencing platforms for mutation profiling in formalin-fixed and paraffin-embedded samples].

Zhonghua Bing Li Xue Za Zhi. 2018 Aug 8;47(8):591-596. doi: 10.3760/cma.j.issn.0529-5807.2018.08.005.

Implementation of next generation sequencing technology for somatic mutation detection in routine laboratory practice.

Pathology. 2018 Jun;50(4):389-401. doi: 10.1016/j.pathol.2018.01.005. Epub 2018 May 8.

Calling Chromosome Alterations, DNA Methylation Statuses, and Mutations in Tumors by Simple Targeted Next-Generation Sequencing: A Solution for Transferring Integrated Pangenomic Studies into Routine Practice?

J Mol Diagn. 2017 Sep;19(5):776-787. doi: 10.1016/j.jmoldx.2017.06.005.

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.

Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.

High concordance of actionable genomic alterations identified between circulating tumor DNA-based and tissue-based next-generation sequencing testing in advanced non-small cell lung cancer: The Korean Lung Liquid Versus Invasive Biopsy Program.

Cancer. 2021 Aug 15;127(16):3019-3028. doi: 10.1002/cncr.33571. Epub 2021 Apr 7.

Analytical performance evaluation of a commercial next generation sequencing liquid biopsy platform using plasma ctDNA, reference standards, and synthetic serial dilution samples derived from normal plasma.

BMC Cancer. 2020 Oct 1;20(1):945. doi: 10.1186/s12885-020-07445-5.

Performance of next-generation sequencing on small tumor specimens and/or low tumor content samples using a commercially available platform.

PLoS One. 2018 Apr 27;13(4):e0196556. doi: 10.1371/journal.pone.0196556. eCollection 2018.

False-negative errors in next-generation sequencing contribute substantially to inconsistency of mutation databases.

PLoS One. 2019 Sep 12;14(9):e0222535. doi: 10.1371/journal.pone.0222535. eCollection 2019.

引用本文的文献

Error-corrected ultradeep next-generation sequencing for detection of clonal haematopoiesis and haematological neoplasms - sensitivity, specificity and accuracy.

PLoS One. 2025 Feb 26;20(2):e0318300. doi: 10.1371/journal.pone.0318300. eCollection 2025.

本文引用的文献

SeqSQC: A Bioconductor Package for Evaluating the Sample Quality of Next-generation Sequencing Data.

Genomics Proteomics Bioinformatics. 2019 Apr;17(2):211-218. doi: 10.1016/j.gpb.2018.07.006. Epub 2019 Apr 5.

Accurate Quantification of Residual Cancer Cells in Pelvic Washing Reveals Association with Cancer Recurrence Following Robot-Assisted Radical Cystectomy.

J Urol. 2019 Jun;201(6):1105-1114. doi: 10.1097/JU.0000000000000142.

Comparison of SureSelect and Nextera Exome Capture Performance in Single-Cell Sequencing.

Hum Hered. 2018;83(3):153-162. doi: 10.1159/000490506. Epub 2019 Jan 22.

Intratumoral and Intertumoral Genomic Heterogeneity of Multifocal Localized Prostate Cancer Impacts Molecular Classifications and Genomic Prognosticators.

Eur Urol. 2017 Feb;71(2):183-192. doi: 10.1016/j.eururo.2016.07.008. Epub 2016 Jul 21.

Overcoming EGFR(T790M) and EGFR(C797S) resistance with mutant-selective allosteric inhibitors.

Nature. 2016 Jun 2;534(7605):129-32. doi: 10.1038/nature17960. Epub 2016 May 25.

VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research.

Nucleic Acids Res. 2016 Jun 20;44(11):e108. doi: 10.1093/nar/gkw227. Epub 2016 Apr 7.

Clinical Applications of Circulating Tumor Cells and Circulating Tumor DNA as Liquid Biopsy.

Cancer Discov. 2016 May;6(5):479-91. doi: 10.1158/2159-8290.CD-15-1483. Epub 2016 Mar 11.

Substantial interindividual and limited intraindividual genomic diversity among tumors from men with metastatic prostate cancer.

Nat Med. 2016 Apr;22(4):369-78. doi: 10.1038/nm.4053. Epub 2016 Feb 29.

Pan-cancer analysis of the extent and consequences of intratumor heterogeneity.

Nat Med. 2016 Jan;22(1):105-13. doi: 10.1038/nm.3984. Epub 2015 Nov 30.

Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection.

Nat Methods. 2015 Jul;12(7):623-30. doi: 10.1038/nmeth.3407. Epub 2015 May 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于下一代测序的多样本比较中定义负突变状态的自适应方法。

An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献