Suppr超能文献

使用 Python、R 和 GenePattern Notebook 实现的 CoGAPS 中的非负矩阵分解,推断单细胞数据中的细胞和分子过程。

Inferring cellular and molecular processes in single-cell data with non-negative matrix factorization using Python, R and GenePattern Notebook implementations of CoGAPS.

机构信息

Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.

Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.

出版信息

Nat Protoc. 2023 Dec;18(12):3690-3731. doi: 10.1038/s41596-023-00892-x. Epub 2023 Nov 21.

Abstract

Non-negative matrix factorization (NMF) is an unsupervised learning method well suited to high-throughput biology. However, inferring biological processes from an NMF result still requires additional post hoc statistics and annotation for interpretation of learned features. Here, we introduce a suite of computational tools that implement NMF and provide methods for accurate and clear biological interpretation and analysis. A generalized discussion of NMF covering its benefits, limitations and open questions is followed by four procedures for the Bayesian NMF algorithm Coordinated Gene Activity across Pattern Subsets (CoGAPS). Each procedure will demonstrate NMF analysis to quantify cell state transitions in a public domain single-cell RNA-sequencing dataset. The first demonstrates PyCoGAPS, our new Python implementation that enhances runtime for large datasets, and the second allows its deployment in Docker. The third procedure steps through the same single-cell NMF analysis using our R CoGAPS interface. The fourth introduces a beginner-friendly CoGAPS platform using GenePattern Notebook, aimed at users with a working conceptual knowledge of data analysis but without a basic proficiency in the R or Python programming language. We also constructed a user-facing website to serve as a central repository for information and instructional materials about CoGAPS and its application programming interfaces. The expected timing to setup the packages and conduct a test run is around 15 min, and an additional 30 min to conduct analyses on a precomputed result. The expected runtime on the user's desired dataset can vary from hours to days depending on factors such as dataset size or input parameters.

摘要

非负矩阵分解 (NMF) 是一种非常适合高通量生物学的无监督学习方法。然而,要从 NMF 结果推断生物学过程,仍然需要额外的事后统计和注释来解释学习到的特征。在这里,我们引入了一套计算工具,实现了 NMF,并提供了准确和清晰的生物学解释和分析方法。首先对 NMF 进行了一般性讨论,涵盖了它的优点、局限性和悬而未决的问题,然后介绍了 Coordinated Gene Activity across Pattern Subsets (CoGAPS) 的贝叶斯 NMF 算法的四个程序。每个程序都将演示 NMF 分析,以量化公共领域单细胞 RNA-seq 数据集的细胞状态转变。第一个演示了 PyCoGAPS,这是我们新的 Python 实现,可提高大型数据集的运行时效率,第二个允许在 Docker 中部署它。第三个程序将使用我们的 R CoGAPS 接口逐步完成相同的单细胞 NMF 分析。第四个介绍了一个适合初学者的 CoGAPS 平台,使用 GenePattern Notebook,面向具有数据分析概念知识但不具备 R 或 Python 编程语言基本熟练程度的用户。我们还构建了一个面向用户的网站,作为关于 CoGAPS 及其应用程序编程接口的信息和教学材料的中央存储库。设置包并进行测试运行的预期时间约为 15 分钟,而在预计算结果上进行分析的额外时间为 30 分钟。根据数据集大小或输入参数等因素,在用户所需数据集上的预期运行时间可能从几小时到几天不等。

相似文献

引用本文的文献

本文引用的文献

8
: batch effect adjustment for RNA-seq count data.RNA测序计数数据的批次效应调整
NAR Genom Bioinform. 2020 Sep;2(3):lqaa078. doi: 10.1093/nargab/lqaa078. Epub 2020 Sep 21.
10
Latent Factor Modeling of scRNA-Seq Data Uncovers Dysregulated Pathways in Autoimmune Disease Patients.
iScience. 2020 Aug 12;23(9):101451. doi: 10.1016/j.isci.2020.101451. eCollection 2020 Sep 25.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验