Suppr超能文献

一种基于信息论的单细胞测序分析方法。

An information-theoretic approach to single cell sequencing analysis.

机构信息

Mathematical Sciences, University of Southampton, Southampton, UK.

Institute for Life Sciences, University of Southampton, Southampton, UK.

出版信息

BMC Bioinformatics. 2023 Aug 12;24(1):311. doi: 10.1186/s12859-023-05424-8.

Abstract

BACKGROUND

Single-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information.

RESULTS

Here, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types.

CONCLUSIONS

Thus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation.

摘要

背景

单细胞测序 (sc-Seq) 实验产生的数据量越来越大。然而,大数据集并不一定包含大量信息。

结果

在这里,我们正式量化了从 sc-Seq 实验中获得的信息,结果表明这些信息与基因表达异质性的直观概念相对应。我们证明了我们的异质性概念与细胞类型之间的自然关系,将异质性分解为归因于细胞类型之间差异表达的那部分(簇间异质性)和其余部分(簇内异质性)。我们将异质性的定义作为聚类算法的目标函数进行测试,并表明它是与不同细胞类型相关的基因表达模式的有用描述符。

结论

因此,我们对基因异质性的定义导致了一种有生物学意义的细胞类型概念,即对于基因表达模式具有统计学等效性的细胞群。我们的异质性度量及其簇间和簇内的分解是非参数的、内在的、无偏的,并且不需要对表达模式做出额外的假设。基于这一理论,我们开发了一种从 sc-Seq 数据中自动进行无监督细胞聚类的有效方法,并提供了一个 R 包实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6792/10422744/ed687ef32340/12859_2023_5424_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验