Suppr超能文献

用于成分高通量测序数据差异丰度检验的得分匹配法

Score matching for differential abundance testing of compositional high-throughput sequencing data.

作者信息

Ostner Johannes, Li Hongzhe, Müller Christian L

机构信息

Computational Health Center, Helmholtz Munich, Neuherberg, Germany.

Institut für Statistik, Ludwig-Maximilians-Universität München, Munich, Germany.

出版信息

bioRxiv. 2024 Dec 9:2024.12.05.627006. doi: 10.1101/2024.12.05.627006.

Abstract

The class of a-b power interaction models, proposed by Yu et al. (2024), provides a general framework for modeling sparse compositional count data with pairwise feature interactions. This class includes many distributions as special cases and enables zero count handling through power transformations, making it especially suitable for modern high- throughput sequencing data with excess zeros, including single-cell RNA-Seq and amplicon sequencing data. Here, we present an extension of this class of models that can include covariate information, allowing for accurate characterization of covariate dependencies in heterogeneous populations. Combining this model with a tailored differential abundance (DA) test leads to a novel DA testing scheme, cosmoDA, that can reduce false positive detection caused by correlated features. cosmoDA uses the generalized score matching estimation framework for power interaction models Our benchmarks on simulated and real data show that cosmoDA can accurately estimate feature interactions in the presence of population heterogeneity and significantly reduces the false discovery rate when testing for differential abundance of correlated features. Finally, cosmoDA provides an explicit link to popular Box-Cox-type data transformations and allows to assess the impact of zero replacement and power transformations on downstream differential abundance results. cosmoDA is available at https://github.com/bio-datascience/cosmoDA.

摘要

Yu等人(2024年)提出的α-β幂交互作用模型类别,为具有成对特征交互作用的稀疏成分计数数据建模提供了一个通用框架。该类别包含许多特殊情况下的分布,并通过幂变换实现零计数处理,使其特别适用于具有过多零值的现代高通量测序数据,包括单细胞RNA测序和扩增子测序数据。在此,我们提出了这一模型类别的扩展,它可以纳入协变量信息,从而能够准确表征异质群体中的协变量依赖性。将该模型与定制的差异丰度(DA)检验相结合,产生了一种新颖的DA检验方案cosmoDA,它可以减少由相关特征导致的误阳性检测。cosmoDA使用幂交互作用模型的广义得分匹配估计框架。我们在模拟数据和真实数据上的基准测试表明,cosmoDA能够在存在群体异质性的情况下准确估计特征交互作用,并在测试相关特征的差异丰度时显著降低错误发现率。最后,cosmoDA提供了与流行的Box-Cox型数据变换的明确联系,并允许评估零替换和幂变换对下游差异丰度结果的影响。cosmoDA可在https://github.com/bio-datascience/cosmoDA获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b47/11661129/242b5388b285/nihpp-2024.12.05.627006v1-f0008.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验