• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Scellpam:一个用于在 scRNAseq 数据集上围绕质心进行并行分区的 R 包/C++ 库。

Scellpam: an R package/C++ library to perform parallel partitioning around medoids on scRNAseq data sets.

机构信息

Department of Informatics, ETSE, University of Valencia, Avda. de la Universidad, s/n, 46100, Burjassot, Valencia, Spain.

Department of Statistics and Operations Research, University of Valencia, Avda. Vicente Andres Estelles, 46100, Burjassot, Valencia, Spain.

出版信息

BMC Bioinformatics. 2023 Sep 14;24(1):342. doi: 10.1186/s12859-023-05471-1.

DOI:10.1186/s12859-023-05471-1
PMID:37710192
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10503022/
Abstract

BACKGROUND

Partitioning around medoids (PAM) is one of the most widely used and successful clustering method in many fields. One of its key advantages is that it only requires a distance or a dissimilarity between the individuals, and the fact that cluster centers are actual points in the data set means they can be taken as reliable representatives of their classes. However, its wider application is hampered by the large amount of memory needed to store the distance matrix (quadratic on the number of individuals) and also by the high computational cost of computing such distance matrix and, less importantly, by the cost of the clustering algorithm itself.

RESULTS

Therefore, new software has been provided that addresses these issues. This software, provided under GPL license and usable as either an R package or a C++ library, calculates in parallel the distance matrix for different distances/dissimilarities ([Formula: see text], [Formula: see text], Pearson, cosine and weighted Euclidean) and also implements a parallel fast version of PAM (FASTPAM1) using any data type to reduce memory usage. Moreover, the parallel implementation uses all the cores available in modern computers which greatly reduces the execution time. Besides its general application, the software is especially useful for processing data of single cell experiments. It has been tested in problems including clustering of single cell experiments with up to 289,000 cells with the expression of about 29,000 genes per cell.

CONCLUSIONS

Comparisons with other current packages in terms of execution time have been made. The method greatly outperforms the available R packages for distance matrix calculation and also improves the packages that implement the PAM itself. The software is available as an R package at https://CRAN.R-project.org/package=scellpam and as C++ libraries at https://github.com/JdMDE/jmatlib and https://github.com/JdMDE/ppamlib The package is useful for single cell RNA-seq studies but it is also applicable in other contexts where clustering of large data sets is required.

摘要

背景

基于中心点的划分(PAM)是在许多领域中应用最广泛、最成功的聚类方法之一。它的一个主要优点是只需要个体之间的距离或不相似性,而且聚类中心是数据集中的实际点,这意味着它们可以作为其类别的可靠代表。然而,由于需要存储距离矩阵(与个体数量的平方成正比),以及计算这种距离矩阵的计算成本高(不太重要的是,聚类算法本身的成本也很高),其更广泛的应用受到了阻碍。

结果

因此,提供了新的软件来解决这些问题。该软件在 GPL 许可证下提供,可作为 R 包或 C++库使用,可并行计算不同距离/不相似性的距离矩阵([公式:见正文]、[公式:见正文]、Pearson、余弦和加权欧几里得),并实现了一种使用任何数据类型来减少内存使用的并行快速 PAM(FASTPAM1)。此外,并行实现使用现代计算机中可用的所有核心,大大减少了执行时间。除了一般应用外,该软件特别适用于处理单细胞实验的数据。它已在包括多达 289000 个细胞和每个细胞约 29000 个基因表达的单细胞实验聚类等问题中进行了测试。

结论

已对执行时间等方面与其他当前软件包进行了比较。该方法大大优于现有的用于距离矩阵计算的 R 包,也改进了实现 PAM 本身的软件包。该软件可作为 R 包在 https://CRAN.R-project.org/package=scellpam 获得,也可作为 C++库在 https://github.com/JdMDE/jmatlib 和 https://github.com/JdMDE/ppamlib 获得。该软件包可用于单细胞 RNA-seq 研究,但也可应用于需要对大型数据集进行聚类的其他情况。

相似文献

1
Scellpam: an R package/C++ library to perform parallel partitioning around medoids on scRNAseq data sets.Scellpam:一个用于在 scRNAseq 数据集上围绕质心进行并行分区的 R 包/C++ 库。
BMC Bioinformatics. 2023 Sep 14;24(1):342. doi: 10.1186/s12859-023-05471-1.
2
A cell abundance analysis based on efficient PAM clustering for a better understanding of the dynamics of endometrial remodelling.基于高效 PAM 聚类的细胞丰度分析,以更好地了解子宫内膜重塑的动力学。
BMC Bioinformatics. 2023 Nov 22;24(1):440. doi: 10.1186/s12859-023-05569-6.
3
A Parallel Architecture for the Partitioning Around Medoids (PAM) Algorithm for Scalable Multi-Core Processor Implementation with Applications in Healthcare.一种用于划分质心算法(PAM)的并行架构,用于可扩展多核处理器的实现,并在医疗保健中有应用。
Sensors (Basel). 2018 Nov 25;18(12):4129. doi: 10.3390/s18124129.
4
FlowGrid enables fast clustering of very large single-cell RNA-seq data.FlowGrid能够对非常大的单细胞RNA测序数据进行快速聚类。
Bioinformatics. 2021 Dec 22;38(1):282-283. doi: 10.1093/bioinformatics/btab521.
5
Spathial: an R package for the evolutionary analysis of biological data.Spathial:用于生物数据进化分析的 R 包。
Bioinformatics. 2020 Nov 1;36(17):4664-4667. doi: 10.1093/bioinformatics/btaa273.
6
wTO: an R package for computing weighted topological overlap and a consensus network with integrated visualization tool.wTO:一个用于计算加权拓扑重叠和共识网络的 R 包,具有集成的可视化工具。
BMC Bioinformatics. 2018 Oct 24;19(1):392. doi: 10.1186/s12859-018-2351-7.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data.SSCC:一种用于快速准确聚类大规模单细胞 RNA-seq 数据的新型计算框架。
Genomics Proteomics Bioinformatics. 2019 Apr;17(2):201-210. doi: 10.1016/j.gpb.2018.10.003. Epub 2019 Jun 13.
9
A computational approach for phenotypic comparisons of cell populations in high-dimensional cytometry data.一种用于高维流式细胞术数据中细胞群体表型比较的计算方法。
Methods. 2018 Jan 1;132:66-75. doi: 10.1016/j.ymeth.2017.09.005. Epub 2017 Sep 14.
10
Optimizing weighted gene co-expression network analysis with a multi-threaded calculation of the topological overlap matrix.通过拓扑重叠矩阵的多线程计算优化加权基因共表达网络分析。
Stat Appl Genet Mol Biol. 2021 Nov 9;20(4-6):145-153. doi: 10.1515/sagmb-2021-0025.

引用本文的文献

1
A cell abundance analysis based on efficient PAM clustering for a better understanding of the dynamics of endometrial remodelling.基于高效 PAM 聚类的细胞丰度分析,以更好地了解子宫内膜重塑的动力学。
BMC Bioinformatics. 2023 Nov 22;24(1):440. doi: 10.1186/s12859-023-05569-6.

本文引用的文献

1
A cell abundance analysis based on efficient PAM clustering for a better understanding of the dynamics of endometrial remodelling.基于高效 PAM 聚类的细胞丰度分析,以更好地了解子宫内膜重塑的动力学。
BMC Bioinformatics. 2023 Nov 22;24(1):440. doi: 10.1186/s12859-023-05569-6.
2
Single-cell transcriptomic analysis of endometriosis.子宫内膜异位症的单细胞转录组分析。
Nat Genet. 2023 Feb;55(2):255-267. doi: 10.1038/s41588-022-01254-1. Epub 2023 Jan 9.
3
Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro.在体和体外描绘人类子宫内膜的时空动态。
Nat Genet. 2021 Dec;53(12):1698-1711. doi: 10.1038/s41588-021-00972-2. Epub 2021 Dec 2.
4
Integrated analysis of multimodal single-cell data.多模态单细胞数据的综合分析。
Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.
5
Single-cell transcriptomic atlas of the human endometrium during the menstrual cycle.人类子宫内膜在月经周期中的单细胞转录组图谱。
Nat Med. 2020 Oct;26(10):1644-1653. doi: 10.1038/s41591-020-1040-z. Epub 2020 Sep 14.
6
Orchestrating single-cell analysis with Bioconductor.使用 Bioconductor 进行单细胞分析的协调。
Nat Methods. 2020 Feb;17(2):137-145. doi: 10.1038/s41592-019-0654-x. Epub 2019 Dec 2.
7
Current best practices in single-cell RNA-seq analysis: a tutorial.单细胞 RNA 测序分析的当前最佳实践:教程。
Mol Syst Biol. 2019 Jun 19;15(6):e8746. doi: 10.15252/msb.20188746.
8
Comprehensive Integration of Single-Cell Data.单细胞数据的综合整合。
Cell. 2019 Jun 13;177(7):1888-1902.e21. doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6.
9
A systematic performance evaluation of clustering methods for single-cell RNA-seq data.单细胞RNA测序数据聚类方法的系统性能评估
F1000Res. 2018 Jul 26;7:1141. doi: 10.12688/f1000research.15666.3. eCollection 2018.
10
Integrating single-cell transcriptomic data across different conditions, technologies, and species.整合不同条件、技术和物种的单细胞转录组数据。
Nat Biotechnol. 2018 Jun;36(5):411-420. doi: 10.1038/nbt.4096. Epub 2018 Apr 2.