• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 RNA-seq 实验的时空计数数据的非参数建模。

Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments.

机构信息

School of Computer Science, University of Manchester, Manchester M13 9PL, UK.

Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK.

出版信息

Bioinformatics. 2021 Nov 5;37(21):3788-3795. doi: 10.1093/bioinformatics/btab486.

DOI:10.1093/bioinformatics/btab486
PMID:34213536
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10186154/
Abstract

MOTIVATION

The negative binomial distribution has been shown to be a good model for counts data from both bulk and single-cell RNA-sequencing (RNA-seq). Gaussian process (GP) regression provides a useful non-parametric approach for modelling temporal or spatial changes in gene expression. However, currently available GP regression methods that implement negative binomial likelihood models do not scale to the increasingly large datasets being produced by single-cell and spatial transcriptomics.

RESULTS

The GPcounts package implements GP regression methods for modelling counts data using a negative binomial likelihood function. Computational efficiency is achieved through the use of variational Bayesian inference. The GP function models changes in the mean of the negative binomial likelihood through a logarithmic link function and the dispersion parameter is fitted by maximum likelihood. We validate the method on simulated time course data, showing better performance to identify changes in over-dispersed counts data than methods based on Gaussian or Poisson likelihoods. To demonstrate temporal inference, we apply GPcounts to single-cell RNA-seq datasets after pseudotime and branching inference. To demonstrate spatial inference, we apply GPcounts to data from the mouse olfactory bulb to identify spatially variable genes and compare to two published GP methods. We also provide the option of modelling additional dropout using a zero-inflated negative binomial. Our results show that GPcounts can be used to model temporal and spatial counts data in cases where simpler Gaussian and Poisson likelihoods are unrealistic.

AVAILABILITY AND IMPLEMENTATION

GPcounts is implemented using the GPflow library in Python and is available at https://github.com/ManchesterBioinference/GPcounts along with the data, code and notebooks required to reproduce the results presented here. The version used for this paper is archived at https://doi.org/10.5281/zenodo.5027066.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

负二项分布已被证明是一种很好的模型,可以用于从批量和单细胞 RNA 测序(RNA-seq)中获得的计数数据。高斯过程(GP)回归为建模基因表达的时间或空间变化提供了一种有用的非参数方法。然而,目前可用的实现负二项式似然模型的 GP 回归方法不适用于单细胞和空间转录组学产生的越来越大的数据集。

结果

GPcounts 包实现了使用负二项式似然函数对计数数据进行 GP 回归的方法。通过使用变分贝叶斯推断,实现了计算效率。GP 函数通过对数链接函数对负二项式似然的均值进行建模,通过最大似然法对离散参数进行拟合。我们在模拟时间序列数据上验证了该方法,结果表明,与基于高斯或泊松似然的方法相比,该方法能够更好地识别过分散计数数据中的变化。为了演示时间推断,我们在经过拟时间和分支推断后,将 GPcounts 应用于单细胞 RNA-seq 数据集。为了演示空间推断,我们将 GPcounts 应用于来自小鼠嗅球的数据,以识别空间变化的基因,并与两种已发表的 GP 方法进行比较。我们还提供了使用零膨胀负二项式对额外缺失值进行建模的选项。我们的结果表明,在简单的高斯和泊松似然不切实际的情况下,GPcounts 可用于对时间和空间计数数据进行建模。

可用性和实现

GPcounts 使用 Python 中的 GPflow 库实现,并可在 https://github.com/ManchesterBioinference/GPcounts 上获得,同时还提供了重现本文中呈现的结果所需的数据、代码和笔记本。本文使用的版本归档在 https://doi.org/10.5281/zenodo.5027066。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/3afdc84281d1/btab486f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/cbe5c08c74bb/btab486f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/8fc8e685cb58/btab486f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/99d0f985b714/btab486f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/f6b07ba4e938/btab486f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/df79f363a17e/btab486f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/91cd108735d2/btab486f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/289cef509602/btab486f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/9b24d68b4c97/btab486f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/3afdc84281d1/btab486f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/cbe5c08c74bb/btab486f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/8fc8e685cb58/btab486f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/99d0f985b714/btab486f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/f6b07ba4e938/btab486f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/df79f363a17e/btab486f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/91cd108735d2/btab486f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/289cef509602/btab486f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/9b24d68b4c97/btab486f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/3afdc84281d1/btab486f9.jpg

相似文献

1
Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments.基于 RNA-seq 实验的时空计数数据的非参数建模。
Bioinformatics. 2021 Nov 5;37(21):3788-3795. doi: 10.1093/bioinformatics/btab486.
2
Bayesian modeling of spatial molecular profiling data via Gaussian process.基于高斯过程的空间分子剖析数据的贝叶斯建模。
Bioinformatics. 2021 Nov 18;37(22):4129-4136. doi: 10.1093/bioinformatics/btab455.
3
bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data.bayNorm:用于单细胞 RNA-seq 数据的贝叶斯基因表达恢复、插补和标准化。
Bioinformatics. 2020 Feb 15;36(4):1174-1181. doi: 10.1093/bioinformatics/btz726.
4
Interpretable factor models of single-cell RNA-seq via variational autoencoders.基于变分自动编码器的单细胞 RNA-seq 可解释因子模型。
Bioinformatics. 2020 Jun 1;36(11):3418-3421. doi: 10.1093/bioinformatics/btaa169.
5
Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data.基于下一代测序数据的多基因/变异关联的分层概率模型。
Bioinformatics. 2017 Oct 1;33(19):3058-3064. doi: 10.1093/bioinformatics/btx355.
6
NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.NBLDA:用于RNA测序数据的负二项式线性判别分析。
BMC Bioinformatics. 2016 Sep 13;17(1):369. doi: 10.1186/s12859-016-1208-1.
7
SimSeq: a nonparametric approach to simulation of RNA-sequence datasets.SimSeq:一种用于RNA序列数据集模拟的非参数方法。
Bioinformatics. 2015 Jul 1;31(13):2131-40. doi: 10.1093/bioinformatics/btv124. Epub 2015 Feb 26.
8
GrandPrix: scaling up the Bayesian GPLVM for single-cell data.GrandPrix:针对单细胞数据扩展贝叶斯 GPLVM。
Bioinformatics. 2019 Jan 1;35(1):47-54. doi: 10.1093/bioinformatics/bty533.
9
A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments.一种灵活的计数数据模型,可适用于广泛复制的 RNA-seq 实验所产生的广泛多样化的表达谱。
BMC Bioinformatics. 2013 Aug 21;14:254. doi: 10.1186/1471-2105-14-254.
10
Fast and accurate approximate inference of transcript expression from RNA-seq data.从RNA测序数据中快速准确地进行转录本表达的近似推断。
Bioinformatics. 2015 Dec 15;31(24):3881-9. doi: 10.1093/bioinformatics/btv483. Epub 2015 Aug 26.

引用本文的文献

1
Systematic benchmarking of computational methods to identify spatially variable genes.用于识别空间可变基因的计算方法的系统基准测试。
Genome Biol. 2025 Sep 18;26(1):285. doi: 10.1186/s13059-025-03731-2.
2
Multimodal integration strategies for clinical application in oncology.肿瘤学临床应用中的多模态整合策略
Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.
3
A Meta-Review of Spatial Transcriptomics Analysis Software.空间转录组学分析软件的元综述

本文引用的文献

1
Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics.贝叶斯模型选择揭示了单细胞转录组学中零膨胀的生物学起源。
Genome Biol. 2020 Jul 27;21(1):183. doi: 10.1186/s13059-020-02103-2.
2
Trajectory-based differential expression analysis for single-cell sequencing data.基于轨迹的单细胞测序数据分析。
Nat Commun. 2020 Mar 5;11(1):1201. doi: 10.1038/s41467-020-14766-3.
3
Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies.空间分辨转录组学研究中空间表达模式的统计分析。
Cells. 2025 Jul 10;14(14):1060. doi: 10.3390/cells14141060.
4
Informatics at the Frontier of Cancer Research.癌症研究前沿的信息学
Cancer Res. 2025 Aug 15;85(16):2967-2986. doi: 10.1158/0008-5472.CAN-24-2829.
5
Benchmarking algorithms for spatially variable gene identification in spatial transcriptomics.空间转录组学中空间可变基因识别的基准测试算法
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf131.
6
SpaFun: Discovering Domain-specific Spatial Expression Patterns and New Disease-Relevant Genes using Functional Principal Component Analysis.SpaFun:使用功能主成分分析发现特定领域的空间表达模式和与疾病相关的新基因。
bioRxiv. 2025 Feb 21:2025.02.17.638766. doi: 10.1101/2025.02.17.638766.
7
Categorization of 34 computational methods to detect spatially variable genes from spatially resolved transcriptomics data.用于从空间转录组学数据中检测空间可变基因的34种计算方法的分类。
Nat Commun. 2025 Jan 29;16(1):1141. doi: 10.1038/s41467-025-56080-w.
8
Detecting significant expression patterns in single-cell and spatial transcriptomics with a flexible computational approach.使用灵活的计算方法在单细胞和空间转录组学中检测显著表达模式。
Sci Rep. 2024 Oct 30;14(1):26121. doi: 10.1038/s41598-024-75314-3.
9
Single-cell omics: experimental workflow, data analyses and applications.单细胞组学:实验工作流程、数据分析及应用
Sci China Life Sci. 2025 Jan;68(1):5-102. doi: 10.1007/s11427-023-2561-0. Epub 2024 Jul 23.
10
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data.对33种从空间转录组学数据中检测空间可变基因的计算方法进行分类。
ArXiv. 2024 Oct 3:arXiv:2405.18779v4.
Nat Methods. 2020 Feb;17(2):193-200. doi: 10.1038/s41592-019-0701-7. Epub 2020 Jan 27.
4
Droplet scRNA-seq is not zero-inflated.液滴单细胞RNA测序不存在零膨胀问题。
Nat Biotechnol. 2020 Feb;38(2):147-150. doi: 10.1038/s41587-019-0379-5.
5
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model.基于多项模型的单细胞 RNA-Seq 特征选择和降维。
Genome Biol. 2019 Dec 23;20(1):295. doi: 10.1186/s13059-019-1861-6.
6
Modeling Cell-Cell Interactions from Spatial Molecular Data with Spatial Variance Component Analysis.基于空间方差成分分析从空间分子数据中建模细胞-细胞相互作用。
Cell Rep. 2019 Oct 1;29(1):202-211.e6. doi: 10.1016/j.celrep.2019.08.077.
7
A comparison of single-cell trajectory inference methods.单细胞轨迹推断方法比较。
Nat Biotechnol. 2019 May;37(5):547-554. doi: 10.1038/s41587-019-0071-9. Epub 2019 Apr 1.
8
GrandPrix: scaling up the Bayesian GPLVM for single-cell data.GrandPrix:针对单细胞数据扩展贝叶斯 GPLVM。
Bioinformatics. 2019 Jan 1;35(1):47-54. doi: 10.1093/bioinformatics/bty533.
9
Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.弹弓:单细胞转录组学的细胞谱系和伪时间推断。
BMC Genomics. 2018 Jun 19;19(1):477. doi: 10.1186/s12864-018-4772-0.
10
BGP: identifying gene-specific branching dynamics from single-cell data with a branching Gaussian process.BGP:使用分支高斯过程从单细胞数据中识别基因特异性分支动态。
Genome Biol. 2018 May 29;19(1):65. doi: 10.1186/s13059-018-1440-2.