Department of Biostatistics, University of California, Los Angeles, CA 90095-1772, USA.
Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA 90095-7246, USA.
Bioinformatics. 2022 Aug 10;38(16):3927-3934. doi: 10.1093/bioinformatics/btac423.
MOTIVATION: Modeling single-cell gene expression trends along cell pseudotime is a crucial analysis for exploring biological processes. Most existing methods rely on nonparametric regression models for their flexibility; however, nonparametric models often provide trends too complex to interpret. Other existing methods use interpretable but restrictive models. Since model interpretability and flexibility are both indispensable for understanding biological processes, the single-cell field needs a model that improves the interpretability and largely maintains the flexibility of nonparametric regression models. RESULTS: Here, we propose the single-cell generalized trend model (scGTM) for capturing a gene's expression trend, which may be monotone, hill-shaped or valley-shaped, along cell pseudotime. The scGTM has three advantages: (i) it can capture non-monotonic trends that are easy to interpret, (ii) its parameters are biologically interpretable and trend informative, and (iii) it can flexibly accommodate common distributions for modeling gene expression counts. To tackle the complex optimization problems, we use the particle swarm optimization algorithm to find the constrained maximum likelihood estimates for the scGTM parameters. As an application, we analyze several single-cell gene expression datasets using the scGTM and show that scGTM can capture interpretable gene expression trends along cell pseudotime and reveal molecular insights underlying biological processes. AVAILABILITY AND IMPLEMENTATION: The Python package scGTM is open-access and available at https://github.com/ElvisCuiHan/scGTM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
动机:沿着细胞拟时间建模单细胞基因表达趋势是探索生物过程的关键分析。大多数现有的方法依赖于灵活的非参数回归模型;然而,非参数模型通常提供过于复杂而难以解释的趋势。其他现有的方法使用可解释但受限制的模型。由于模型的可解释性和灵活性对于理解生物过程都是不可或缺的,单细胞领域需要一种既能提高可解释性又能在很大程度上保持非参数回归模型灵活性的模型。
结果:在这里,我们提出了单细胞广义趋势模型(scGTM),用于捕获基因在细胞拟时间上的表达趋势,该趋势可能是单调的、山形的或山谷形的。scGTM 有三个优点:(i)它可以捕获易于解释的非单调趋势,(ii)其参数具有生物学可解释性和趋势信息性,(iii)它可以灵活地适应用于建模基因表达计数的常见分布。为了解决复杂的优化问题,我们使用粒子群优化算法来找到 scGTM 参数的约束最大似然估计。作为应用,我们使用 scGTM 分析了几个单细胞基因表达数据集,并表明 scGTM 可以捕获沿着细胞拟时间的可解释基因表达趋势,并揭示生物过程背后的分子见解。
可用性和实现:Python 包 scGTM 是开放访问的,并可在 https://github.com/ElvisCuiHan/scGTM 上获得。
补充信息:补充数据可在《生物信息学》在线获得。
Bioinformatics. 2019-1-1
Bioinformatics. 2022-6-24
Bioinformatics. 2019-1-1
Bioinformatics. 2019-2-15
Bioinformatics. 2017-4-15
Nucleic Acids Res. 2021-8-20
Genome Biol. 2022-1-21
Nucleic Acids Res. 2021-8-20
Comput Struct Biotechnol J. 2020-9-28
BMC Bioinformatics. 2020-5-1
BMC Bioinformatics. 2018-10-16
Nucleic Acids Res. 2018-11-16