Suppr超能文献

SECEDO:基于 SNV 的亚克隆检测,使用超低覆盖度单细胞 DNA 测序。

SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing.

机构信息

Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zurich, Switzerland.

Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland.

出版信息

Bioinformatics. 2022 Sep 15;38(18):4293-4300. doi: 10.1093/bioinformatics/btac510.

Abstract

MOTIVATION

Several recently developed single-cell DNA sequencing technologies enable whole-genome sequencing of thousands of cells. However, the ultra-low coverage of the sequenced data (<0.05× per cell) mostly limits their usage to the identification of copy number alterations in multi-megabase segments. Many tumors are not copy number-driven, and thus single-nucleotide variant (SNV)-based subclone detection may contribute to a more comprehensive view on intra-tumor heterogeneity. Due to the low coverage of the data, the identification of SNVs is only possible when superimposing the sequenced genomes of hundreds of genetically similar cells. Thus, we have developed a new approach to efficiently cluster tumor cells based on a Bayesian filtering approach of relevant loci and exploiting read overlap and phasing.

RESULTS

We developed Single Cell Data Tumor Clusterer (SECEDO, lat. 'to separate'), a new method to cluster tumor cells based solely on SNVs, inferred on ultra-low coverage single-cell DNA sequencing data. We applied SECEDO to a synthetic dataset simulating 7250 cells and eight tumor subclones from a single patient and were able to accurately reconstruct the clonal composition, detecting 92.11% of the somatic SNVs, with the smallest clusters representing only 6.9% of the total population. When applied to five real single-cell sequencing datasets from a breast cancer patient, each consisting of ≈2000 cells, SECEDO was able to recover the major clonal composition in each dataset at the original coverage of 0.03×, achieving an Adjusted Rand Index (ARI) score of ≈0.6. The current state-of-the-art SNV-based clustering method achieved an ARI score of ≈0, even after merging cells to create higher coverage data (factor 10 increase), and was only able to match SECEDOs performance when pooling data from all five datasets, in addition to artificially increasing the sequencing coverage by a factor of 7. Variant calling on the resulting clusters recovered more than twice as many SNVs as would have been detected if calling on all cells together. Further, the allelic ratio of the called SNVs on each subcluster was more than double relative to the allelic ratio of the SNVs called without clustering, thus demonstrating that calling variants on subclones, in addition to both increasing sensitivity of SNV detection and attaching SNVs to subclones, significantly increases the confidence of the called variants.

AVAILABILITY AND IMPLEMENTATION

SECEDO is implemented in C++ and is publicly available at https://github.com/ratschlab/secedo. Instructions to download the data and the evaluation code to reproduce the findings in this paper are available at: https://github.com/ratschlab/secedo-evaluation. The code and data of the submitted version are archived at: https://doi.org/10.5281/zenodo.6516955.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

最近开发的几种单细胞 DNA 测序技术能够对数千个细胞进行全基因组测序。然而,测序数据的超低覆盖率(每个细胞<0.05×)主要限制了它们在识别兆碱基片段的拷贝数改变中的使用。许多肿瘤不是由拷贝数驱动的,因此基于单核苷酸变异(SNV)的亚克隆检测可能有助于更全面地了解肿瘤内异质性。由于数据的覆盖率低,只有在叠加数百个遗传相似的细胞的测序基因组时,才能识别 SNV。因此,我们开发了一种新方法,基于相关基因座的贝叶斯过滤方法,并利用读取重叠和相位,有效地对肿瘤细胞进行聚类。

结果

我们开发了基于单核苷酸变异(SNV)的单细胞数据肿瘤聚类器(SECEDO,拉丁语为“分离”),这是一种仅基于超低覆盖度单细胞 DNA 测序数据推断 SNV 的新方法。我们将 SECEDO 应用于一个模拟来自单个患者的 7250 个细胞和 8 个肿瘤亚克隆的合成数据集,并能够准确重建克隆组成,检测到 92.11%的体细胞 SNV,最小的聚类仅代表总群体的 6.9%。当应用于来自一名乳腺癌患者的五个真实单细胞测序数据集时,每个数据集包含约 2000 个细胞,SECEDO 能够在原始覆盖度为 0.03×的情况下恢复每个数据集的主要克隆组成,达到调整兰德指数(ARI)评分约 0.6。目前最先进的基于 SNV 的聚类方法的 ARI 评分约为 0,即使在合并细胞以创建更高覆盖率的数据(增加 10 倍)后也是如此,并且仅在从所有五个数据集合并数据以及人工增加测序覆盖率 7 倍的情况下才能与 SECEDO 的性能相匹配。在生成的聚类上进行变异调用比在所有细胞上一起调用可恢复两倍以上的 SNV。此外,每个亚群上调用的 SNV 的等位基因比与不聚类时调用的 SNV 的等位基因比高两倍以上,这表明除了提高 SNV 检测的灵敏度并将 SNV 分配给亚克隆之外,在亚克隆上调用变体还显著提高了调用变体的可信度。

可用性和实现

SECEDO 是用 C++编写的,并在 https://github.com/ratschlab/secedo 上公开提供。在以下网址可以找到下载数据和评估代码以重现本文中发现的说明:https://github.com/ratschlab/secedo-evaluation。提交版本的代码和数据已存档于:https://doi.org/10.5281/zenodo.6516955。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1436/9477524/bcec17e7c9a2/btac510f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验