单细胞测序癌症数据的有效聚类

Effective Clustering for Single Cell Sequencing Cancer Data.

作者信息

Ciccolella Simone, Patterson Murray, Bonizzoni Paola, Della Vedova Gianluca

出版信息

IEEE J Biomed Health Inform. 2021 Nov;25(11):4068-4078. doi: 10.1109/JBHI.2021.3081380. Epub 2021 Nov 5.

DOI:10.1109/JBHI.2021.3081380

Abstract

Single cell sequencing (SCS) technologies provide a level of resolution that makes it indispensable for inferring from a sequenced tumor, evolutionary trees or phylogenies representing an accumulation of cancerous mutations. A drawback of SCS is elevated false negative and missing value rates, resulting in a large space of possible solutions, which in turn makes it difficult, sometimes infeasible using current approaches and tools. One possible solution is to reduce the size of an SCS instance - usually represented as a matrix of presence, absence, and uncertainty of the mutations found in the different sequenced cells - and to infer the tree from this reduced-size instance. In this work, we present a new clustering procedure aimed at clustering such categorical vector, or matrix data - here representing SCS instances, called celluloid. We show that celluloid clusters mutations with high precision: never pairing too many mutations that are unrelated in the ground truth, but also obtains accurate results in terms of the phylogeny inferred downstream from the reduced instance produced by this method. We demonstrate the usefulness of a clustering step by applying the entire pipeline (clustering + inference method) to a real dataset, showing a significant reduction in the runtime, raising considerably the upper bound on the size of SCS instances which can be solved in practice. Our approach, celluloid: clustering single cell sequencing data around centroids is available at https://github.com/AlgoLab/celluloid/ under an MIT license, as well as on the Python Package Index (PyPI) at https://pypi.org/project/celluloid-clust/.

摘要

单细胞测序（SCS）技术提供了一种分辨率水平，这使得它对于从测序肿瘤中推断代表癌性突变积累的进化树或系统发育至关重要。SCS的一个缺点是假阴性率和缺失值率较高，导致可能的解决方案空间很大，这反过来又使得使用当前方法和工具变得困难，有时甚至不可行。一种可能的解决方案是减小SCS实例的大小——通常表示为在不同测序细胞中发现的突变的存在、缺失和不确定性的矩阵——并从这个减小尺寸的实例中推断树。在这项工作中，我们提出了一种新的聚类程序，旨在对这种分类向量或矩阵数据进行聚类——这里表示SCS实例，称为赛璐珞（celluloid）。我们表明，赛璐珞能高精度地聚类突变：从不将太多在真实情况中无关的突变配对在一起，而且在从该方法产生的减小实例下游推断的系统发育方面也能获得准确的结果。我们通过将整个流程（聚类 + 推理方法）应用于一个真实数据集来证明聚类步骤的有用性，结果显示运行时间显著减少，大大提高了在实际中可以解决的SCS实例大小的上限。我们的方法，赛璐珞：围绕质心聚类单细胞测序数据，可在https://github.com/AlgoLab/celluloid/ 上以MIT许可获取，也可在Python包索引（PyPI）上的https://pypi.org/project/celluloid-clust/ 获得。

相似文献

Effective Clustering for Single Cell Sequencing Cancer Data.单细胞测序癌症数据的有效聚类

IEEE J Biomed Health Inform. 2021 Nov;25(11):4068-4078. doi: 10.1109/JBHI.2021.3081380. Epub 2021 Nov 5.

Inferring cancer progression from Single-Cell Sequencing while allowing mutation losses.从单细胞测序推断癌症进展，同时允许突变丢失。

Bioinformatics. 2021 Apr 20;37(3):326-333. doi: 10.1093/bioinformatics/btaa722.

AMC: accurate mutation clustering from single-cell DNA sequencing data.AMC：从单细胞DNA测序数据中进行准确的突变聚类

Bioinformatics. 2022 Mar 4;38(6):1732-1734. doi: 10.1093/bioinformatics/btab857.

Inferring clonal evolution of tumors from single nucleotide somatic mutations.从单核苷酸体细胞突变推断肿瘤的克隆进化。

BMC Bioinformatics. 2014 Feb 1;15:35. doi: 10.1186/1471-2105-15-35.

Summarizing the solution space in tumor phylogeny inference by multiple consensus trees.通过多棵一致树对肿瘤系统发育推断中的解决方案空间进行总结。

Bioinformatics. 2019 Jul 15;35(14):i408-i416. doi: 10.1093/bioinformatics/btz312.

PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem.PhISCS-BnB：用于完美肿瘤系统发育重建问题的快速分支定界算法。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i169-i176. doi: 10.1093/bioinformatics/btaa464.

Does Relaxing the Infinite Sites Assumption Give Better Tumor Phylogenies? An ILP-Based Comparative Approach.放松无限位点假设是否能给出更好的肿瘤进化树？基于整数线性规划的比较方法。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1410-1423. doi: 10.1109/TCBB.2018.2865729.

Alignment-free clustering of UMI tagged DNA molecules.无比对聚类分析 UMI 标签化 DNA 分子。

Bioinformatics. 2019 Jun 1;35(11):1829-1836. doi: 10.1093/bioinformatics/bty888.

Assessing the performance of methods for cell clustering from single-cell DNA sequencing data.评估单细胞 DNA 测序数据中细胞聚类方法的性能。

PLoS Comput Biol. 2023 Oct 12;19(10):e1010480. doi: 10.1371/journal.pcbi.1010480. eCollection 2023 Oct.

GeoWaVe: geometric median clustering with weighted voting for ensemble clustering of cytometry data.GeoWaVe：带加权投票的几何中位数聚类，用于流式细胞术数据的集成聚类。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac751.

引用本文的文献

Assessing the performance of methods for cell clustering from single-cell DNA sequencing data.评估单细胞 DNA 测序数据中细胞聚类方法的性能。

PLoS Comput Biol. 2023 Oct 12;19(10):e1010480. doi: 10.1371/journal.pcbi.1010480. eCollection 2023 Oct.

bmVAE: a variational autoencoder method for clustering single-cell mutation data.基于变分自编码器的单细胞突变聚类方法。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac790.

SCClone: Accurate Clustering of Tumor Single-Cell DNA Sequencing Data.SCClone：肿瘤单细胞DNA测序数据的精确聚类

Front Genet. 2022 Jan 27;13:823941. doi: 10.3389/fgene.2022.823941. eCollection 2022.

From Alpha to Zeta: Identifying Variants and Subtypes of SARS-CoV-2 Via Clustering.从 Alpha 到 Zeta：通过聚类识别 SARS-CoV-2 的变体和亚型。

J Comput Biol. 2021 Nov;28(11):1113-1129. doi: 10.1089/cmb.2021.0302. Epub 2021 Oct 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

单细胞测序癌症数据的有效聚类

Effective Clustering for Single Cell Sequencing Cancer Data.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献