Suppr超能文献

单细胞广义趋势模型 (scGTM):一种灵活且可解释的基因表达沿细胞拟时间趋势模型。

Single-cell generalized trend model (scGTM): a flexible and interpretable model of gene expression trend along cell pseudotime.

机构信息

Department of Biostatistics, University of California, Los Angeles, CA 90095-1772, USA.

Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA 90095-7246, USA.

出版信息

Bioinformatics. 2022 Aug 10;38(16):3927-3934. doi: 10.1093/bioinformatics/btac423.

Abstract

MOTIVATION

Modeling single-cell gene expression trends along cell pseudotime is a crucial analysis for exploring biological processes. Most existing methods rely on nonparametric regression models for their flexibility; however, nonparametric models often provide trends too complex to interpret. Other existing methods use interpretable but restrictive models. Since model interpretability and flexibility are both indispensable for understanding biological processes, the single-cell field needs a model that improves the interpretability and largely maintains the flexibility of nonparametric regression models.

RESULTS

Here, we propose the single-cell generalized trend model (scGTM) for capturing a gene's expression trend, which may be monotone, hill-shaped or valley-shaped, along cell pseudotime. The scGTM has three advantages: (i) it can capture non-monotonic trends that are easy to interpret, (ii) its parameters are biologically interpretable and trend informative, and (iii) it can flexibly accommodate common distributions for modeling gene expression counts. To tackle the complex optimization problems, we use the particle swarm optimization algorithm to find the constrained maximum likelihood estimates for the scGTM parameters. As an application, we analyze several single-cell gene expression datasets using the scGTM and show that scGTM can capture interpretable gene expression trends along cell pseudotime and reveal molecular insights underlying biological processes.

AVAILABILITY AND IMPLEMENTATION

The Python package scGTM is open-access and available at https://github.com/ElvisCuiHan/scGTM.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

沿着细胞拟时间建模单细胞基因表达趋势是探索生物过程的关键分析。大多数现有的方法依赖于灵活的非参数回归模型;然而,非参数模型通常提供过于复杂而难以解释的趋势。其他现有的方法使用可解释但受限制的模型。由于模型的可解释性和灵活性对于理解生物过程都是不可或缺的,单细胞领域需要一种既能提高可解释性又能在很大程度上保持非参数回归模型灵活性的模型。

结果

在这里,我们提出了单细胞广义趋势模型(scGTM),用于捕获基因在细胞拟时间上的表达趋势,该趋势可能是单调的、山形的或山谷形的。scGTM 有三个优点:(i)它可以捕获易于解释的非单调趋势,(ii)其参数具有生物学可解释性和趋势信息性,(iii)它可以灵活地适应用于建模基因表达计数的常见分布。为了解决复杂的优化问题,我们使用粒子群优化算法来找到 scGTM 参数的约束最大似然估计。作为应用,我们使用 scGTM 分析了几个单细胞基因表达数据集,并表明 scGTM 可以捕获沿着细胞拟时间的可解释基因表达趋势,并揭示生物过程背后的分子见解。

可用性和实现

Python 包 scGTM 是开放访问的,并可在 https://github.com/ElvisCuiHan/scGTM 上获得。

补充信息

补充数据可在《生物信息学》在线获得。

相似文献

本文引用的文献

4
Naught all zeros in sequence count data are the same.序列计数数据中的零并非都相同。
Comput Struct Biotechnol J. 2020 Sep 28;18:2789-2798. doi: 10.1016/j.csbj.2020.09.014. eCollection 2020.
6
Negative binomial additive model for RNA-Seq data analysis.RNA-Seq 数据分析的负二项式加性模型。
BMC Bioinformatics. 2020 May 1;21(1):171. doi: 10.1186/s12859-020-3506-x.
8
The single-cell transcriptional landscape of mammalian organogenesis.哺乳动物器官发生的单细胞转录组图谱。
Nature. 2019 Feb;566(7745):496-502. doi: 10.1038/s41586-019-0969-x. Epub 2019 Feb 20.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验