Suppr超能文献

基于随机矩阵理论方法分析单细胞 RNA-seq 工作流程。

Analysis of a Single Cell RNA-seq Workflow by Random Matrix Theory Methods.

机构信息

Department of Mathematics and Statistics, Georgetown University, Washington, 20057, DC, USA.

出版信息

Bull Math Biol. 2024 Nov 25;87(1):4. doi: 10.1007/s11538-024-01376-z.

Abstract

Single cell RNA-seq (scRNAseq) workflows typically start with a count matrix and end with the clustering of sampled cells. While a range of methods have been developed to cluster scRNAseq datasets, no theoretical tools exist to explain why a particular cluster exists or why a hypothesized cluster is missing. Recently, several authors have shown that eigenvalues of scRNAseq count matrices can be approximated using random matrix models. In this work, we extend these previous works to the study of a scRNAseq workflow. We model scaled count matrices using random matrices with normally distributed entries. Using these random matrix models, we quantify the differential expression of a cluster and develop predictions for the workflow, and in particular clustering, as a function of the differential expression. We also use results from random matrix theory (RMT) to develop predictive formulas for portions of the scRNAseq workflow. Using simulated and real datasets, we show that our predictions are accurate if certain conditions hold on differential expression, with our RMT based predictions requiring particularly stringent condition. We find that real datasets violate these conditions, leading to bias in our predictions, but our predictions are better than a naive estimator and we point out future work that can improve the predictions. To our knowledge, our formulas represents the first predictive results for scRNAseq workflows.

摘要

单细胞 RNA 测序 (scRNAseq) 工作流程通常从计数矩阵开始,以采样细胞的聚类结束。虽然已经开发了一系列方法来对 scRNAseq 数据集进行聚类,但没有理论工具可以解释为什么存在特定的聚类,或者为什么假设的聚类缺失。最近,几位作者已经表明,scRNAseq 计数矩阵的特征值可以使用随机矩阵模型进行近似。在这项工作中,我们将这些以前的工作扩展到 scRNAseq 工作流程的研究中。我们使用具有正态分布项的随机矩阵对缩放计数矩阵进行建模。使用这些随机矩阵模型,我们量化了一个簇的差异表达,并针对工作流程(特别是聚类)作为差异表达的函数进行了预测。我们还使用随机矩阵理论 (RMT) 的结果来为 scRNAseq 工作流程的部分内容开发预测公式。使用模拟和真实数据集,我们表明,如果差异表达满足某些条件,我们的预测是准确的,而我们基于 RMT 的预测则需要特别严格的条件。我们发现真实数据集违反了这些条件,导致我们的预测存在偏差,但我们的预测优于简单的估计器,我们指出了可以改进预测的未来工作。据我们所知,我们的公式代表了 scRNAseq 工作流程的第一个预测结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验