Suppr超能文献

基于 RNA-seq 实验的时空计数数据的非参数建模。

Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments.

机构信息

School of Computer Science, University of Manchester, Manchester M13 9PL, UK.

Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK.

出版信息

Bioinformatics. 2021 Nov 5;37(21):3788-3795. doi: 10.1093/bioinformatics/btab486.

Abstract

MOTIVATION

The negative binomial distribution has been shown to be a good model for counts data from both bulk and single-cell RNA-sequencing (RNA-seq). Gaussian process (GP) regression provides a useful non-parametric approach for modelling temporal or spatial changes in gene expression. However, currently available GP regression methods that implement negative binomial likelihood models do not scale to the increasingly large datasets being produced by single-cell and spatial transcriptomics.

RESULTS

The GPcounts package implements GP regression methods for modelling counts data using a negative binomial likelihood function. Computational efficiency is achieved through the use of variational Bayesian inference. The GP function models changes in the mean of the negative binomial likelihood through a logarithmic link function and the dispersion parameter is fitted by maximum likelihood. We validate the method on simulated time course data, showing better performance to identify changes in over-dispersed counts data than methods based on Gaussian or Poisson likelihoods. To demonstrate temporal inference, we apply GPcounts to single-cell RNA-seq datasets after pseudotime and branching inference. To demonstrate spatial inference, we apply GPcounts to data from the mouse olfactory bulb to identify spatially variable genes and compare to two published GP methods. We also provide the option of modelling additional dropout using a zero-inflated negative binomial. Our results show that GPcounts can be used to model temporal and spatial counts data in cases where simpler Gaussian and Poisson likelihoods are unrealistic.

AVAILABILITY AND IMPLEMENTATION

GPcounts is implemented using the GPflow library in Python and is available at https://github.com/ManchesterBioinference/GPcounts along with the data, code and notebooks required to reproduce the results presented here. The version used for this paper is archived at https://doi.org/10.5281/zenodo.5027066.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

负二项分布已被证明是一种很好的模型,可以用于从批量和单细胞 RNA 测序(RNA-seq)中获得的计数数据。高斯过程(GP)回归为建模基因表达的时间或空间变化提供了一种有用的非参数方法。然而,目前可用的实现负二项式似然模型的 GP 回归方法不适用于单细胞和空间转录组学产生的越来越大的数据集。

结果

GPcounts 包实现了使用负二项式似然函数对计数数据进行 GP 回归的方法。通过使用变分贝叶斯推断,实现了计算效率。GP 函数通过对数链接函数对负二项式似然的均值进行建模,通过最大似然法对离散参数进行拟合。我们在模拟时间序列数据上验证了该方法,结果表明,与基于高斯或泊松似然的方法相比,该方法能够更好地识别过分散计数数据中的变化。为了演示时间推断,我们在经过拟时间和分支推断后,将 GPcounts 应用于单细胞 RNA-seq 数据集。为了演示空间推断,我们将 GPcounts 应用于来自小鼠嗅球的数据,以识别空间变化的基因,并与两种已发表的 GP 方法进行比较。我们还提供了使用零膨胀负二项式对额外缺失值进行建模的选项。我们的结果表明,在简单的高斯和泊松似然不切实际的情况下,GPcounts 可用于对时间和空间计数数据进行建模。

可用性和实现

GPcounts 使用 Python 中的 GPflow 库实现,并可在 https://github.com/ManchesterBioinference/GPcounts 上获得,同时还提供了重现本文中呈现的结果所需的数据、代码和笔记本。本文使用的版本归档在 https://doi.org/10.5281/zenodo.5027066。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1221/10186154/cbe5c08c74bb/btab486f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验