Suppr超能文献

基于单细胞 RNA 测序数据的混合分布模型。

Mixed Distribution Models Based on Single-Cell RNA Sequencing Data.

机构信息

School of Science, Jiangnan University, Wuxi, 214122, China.

School of Mathematics Statistics and Physics, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK.

出版信息

Interdiscip Sci. 2021 Sep;13(3):362-370. doi: 10.1007/s12539-021-00427-6. Epub 2021 Mar 22.

Abstract

Progress in single-cell RNA sequencing (scRNA-seq) has yielded a lot of valuable data. Analysis of these data can provide a new perspective for studying the intratumoral heterogeneity and identifying gene markers. In this paper, the scRNA-seq data of colorectal cancer (CRC) are analyzed, and it is found that the shape of the gene expression difference (GED) data shows certain distribution regularity. To study the distribution regularity, mixed stable-normal distribution (MSND) model and mixed stable-exponential distribution (MSED) model are constructed to fit the GED data. And the estimated parameters of MSND and MSED are used to describe some characteristics of their distribution. Through the comparison of root mean square error and the chi-squared goodness of fit test, it is found that the fitting effect of MSED and MSND are both better than that of stable distribution and Cauchy distribution. Considering the given quantile thresholds, MSND and MSED can be used to identify tumor-related genes. The results of functional analysis indicate that the selected genes are highly correlated with CRC. In addition, the parameters of MSND and MSED exhibit a certain trend with the development of CRC. To explore the association, Gene-set enrichment analysis (GSEA) is performed. The results of GSEA reveal that the trend can well characterize the intratumoral heterogeneity of CRC. In addition, the application of MSED model on hepatocellular carcinoma shows that our model can analyze other cancers. Overall, MSND model and MSED model can well fit the GED data in different disease stages, the parameters of the two models can characterize the heterogeneity of CRC tumor cells, and the two models can be used to identify genes highly correlated with tumors.

摘要

单细胞 RNA 测序 (scRNA-seq) 的进展产生了大量有价值的数据。对这些数据的分析可以为研究肿瘤内异质性和鉴定基因标记物提供新的视角。本文分析了结直肠癌 (CRC) 的 scRNA-seq 数据,发现基因表达差异 (GED) 数据的形状呈现出一定的分布规律。为了研究这种分布规律,构建了混合稳定正态分布 (MSND) 模型和混合稳定指数分布 (MSED) 模型来拟合 GED 数据。并使用 MSND 和 MSED 的估计参数来描述其分布的一些特征。通过均方根误差和卡方拟合优度检验的比较,发现 MSED 和 MSND 的拟合效果均优于稳定分布和柯西分布。考虑到给定的分位数阈值,MSND 和 MSED 可用于识别与肿瘤相关的基因。功能分析的结果表明,选择的基因与 CRC 高度相关。此外,MSND 和 MSED 的参数与 CRC 的发展呈现出一定的趋势。为了探索这种关联,进行了基因集富集分析 (GSEA)。GSEA 的结果表明,这种趋势可以很好地描述 CRC 的肿瘤内异质性。此外,MSED 模型在肝细胞癌中的应用表明,我们的模型可以分析其他癌症。总体而言,MSND 模型和 MSED 模型可以很好地拟合不同疾病阶段的 GED 数据,两个模型的参数可以描述 CRC 肿瘤细胞的异质性,并且两个模型可以用于识别与肿瘤高度相关的基因。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验