Suppr超能文献

高维组合数据的稳健协方差估计及其在微生物群落分析中的应用。

Robust covariance estimation for high-dimensional compositional data with application to microbial communities analysis.

机构信息

Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, Shandong, China.

School of Mathematics and Statistics and Research Institute of Mathematical Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China.

出版信息

Stat Med. 2021 Jul 10;40(15):3499-3515. doi: 10.1002/sim.8979. Epub 2021 Apr 11.

Abstract

Microbial communities analysis is drawing growing attention due to the rapid development fire of high-throughput sequencing techniques nowadays. The observed data has the following typical characteristics: it is high-dimensional, compositional (lying in a simplex) and even would be leptokurtic and highly skewed due to the existence of overly abundant taxa, which makes the conventional correlation analysis infeasible to study the co-occurrence and co-exclusion relationship between microbial taxa. In this article, we address the challenges of covariance estimation for this kind of data. Assuming the basis covariance matrix lying in a well-recognized class of sparse covariance matrices, we adopt a proxy matrix known as centered log-ratio covariance matrix in the literature. We construct a Median-of-Means estimator for the centered log-ratio covariance matrix and propose a thresholding procedure that is adaptive to the variability of individual entries. By imposing a much weaker finite fourth moment condition compared with the sub-Gaussianity condition in the literature, we derive the optimal rate of convergence under the spectral norm. In addition, we also provide theoretical guarantee on support recovery. The adaptive thresholding procedure of the MOM estimator is easy to implement and gains robustness when outliers or heavy-tailedness exist. Thorough simulation studies are conducted to show the advantages of the proposed procedure over some state-of-the-arts methods. At last, we apply the proposed method to analyze a microbiome dataset in human gut.

摘要

由于高通量测序技术的快速发展,微生物群落分析越来越受到关注。观测数据具有以下典型特征:它是高维的、组成的(位于单形体内),甚至由于过度丰富的分类存在,会出现尖峰和高度偏态,这使得传统的相关分析方法无法研究微生物分类之间的共现和互斥关系。在本文中,我们解决了这种数据的协方差估计的挑战。假设基础协方差矩阵位于一个公认的稀疏协方差矩阵类中,我们采用文献中称为中心对数比协方差矩阵的代理矩阵。我们为中心对数比协方差矩阵构建了一个中位数均值估计量,并提出了一种自适应于各个条目变异性的阈值处理程序。通过施加比文献中的次高斯条件弱得多的有限四阶矩条件,我们在谱范数下推导出最优的收敛速度。此外,我们还提供了关于支持恢复的理论保证。MOM 估计量的自适应阈值处理程序易于实现,并且在存在离群值或重尾时具有稳健性。我们进行了彻底的模拟研究,以显示所提出的方法相对于一些最先进的方法的优势。最后,我们将所提出的方法应用于分析人类肠道中的微生物组数据集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验