Suppr超能文献

基于下一代测序的多样本比较中定义负突变状态的自适应方法。

An adaptive method of defining negative mutation status for multi-sample comparison using next-generation sequencing.

机构信息

Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA.

PET/CT Center, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology, Hefei, 230001, China.

出版信息

BMC Med Genomics. 2021 Dec 2;14(Suppl 2):32. doi: 10.1186/s12920-021-00880-8.

Abstract

BACKGROUND

Multi-sample comparison is commonly used in cancer genomics studies. By using next-generation sequencing (NGS), a mutation's status in a specific sample can be measured by the number of reads supporting mutant or wildtype alleles. When no mutant reads are detected, it could represent either a true negative mutation status or a false negative due to an insufficient number of reads, so-called "coverage". To minimize the chance of false-negative, we should consider the mutation status as "unknown" instead of "negative" when the coverage is inadequately low. There is no established method for determining the coverage threshold between negative and unknown statuses. A common solution is to apply a universal minimum coverage (UMC). However, this method relies on an arbitrarily chosen threshold, and it does not take into account the mutations' relative abundances, which can vary dramatically by the type of mutations. The result could be misclassification between negative and unknown statuses.

METHODS

We propose an adaptive mutation-specific negative (MSN) method to improve the discrimination between negative and unknown mutation statuses. For a specific mutation, a non-positive sample is compared with every known positive sample to test the null hypothesis that they may contain the same frequency of mutant reads. The non-positive sample can only be claimed as "negative" when this null hypothesis is rejected with all known positive samples; otherwise, the status would be "unknown".

RESULTS

We first compared the performance of MSN and UMC methods in a simulated dataset containing varying tumor cell fractions. Only the MSN methods appropriately assigned negative statuses for samples with both high- and low-tumor cell fractions. When evaluated on a real dual-platform single-cell sequencing dataset, the MSN method not only provided more accurate assessments of negative statuses but also yielded three times more available data after excluding the "unknown" statuses, compared with the UMC method.

CONCLUSIONS

We developed a new adaptive method for distinguishing unknown from negative statuses in multi-sample comparison NGS data. The method can provide more accurate negative statuses than the conventional UMC method and generate a remarkably higher amount of available data by reducing unnecessary "unknown" calls.

摘要

背景

多样本比较在癌症基因组学研究中经常使用。通过使用下一代测序(NGS),可以通过支持突变或野生型等位基因的读取数来测量特定样本中突变的状态。当没有检测到突变读取时,它可能代表真正的阴性突变状态,也可能由于读取数不足而出现假阴性,即所谓的“覆盖度”。为了最大程度地减少假阴性的机会,当覆盖度不足够低时,我们应该将突变状态视为“未知”,而不是“阴性”。目前还没有确定用于确定阴性和未知状态之间覆盖度阈值的既定方法。一种常见的解决方案是应用通用最小覆盖度(UMC)。但是,这种方法依赖于任意选择的阈值,并且没有考虑到突变的相对丰度,这些丰度可能因突变类型的不同而有很大差异。结果可能导致阴性和未知状态之间的分类错误。

方法

我们提出了一种自适应的突变特异性阴性(MSN)方法,以改善阴性和未知突变状态之间的区分。对于特定的突变,将非阳性样本与每个已知的阳性样本进行比较,以检验它们可能包含相同频率的突变读取的零假设。只有当该零假设被所有已知的阳性样本拒绝时,非阳性样本才能被断言为“阴性”;否则,状态将为“未知”。

结果

我们首先在包含不同肿瘤细胞分数的模拟数据集中比较了 MSN 和 UMC 方法的性能。只有 MSN 方法适当地将高肿瘤细胞分数和低肿瘤细胞分数的样本分配为阴性状态。在真实的双平台单细胞测序数据集上进行评估时,MSN 方法不仅提供了更准确的阴性状态评估,而且与 UMC 方法相比,在排除“未知”状态后,还产生了三倍以上的可用数据。

结论

我们开发了一种用于区分多样本比较 NGS 数据中未知和阴性状态的新自适应方法。与传统的 UMC 方法相比,该方法可以提供更准确的阴性状态,并通过减少不必要的“未知”调用生成显著更多的可用数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee32/8638096/dce7f27e500d/12920_2021_880_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验