Suppr超能文献

高维组合数据的稳健协方差估计及其在微生物群落分析中的应用。

Robust covariance estimation for high-dimensional compositional data with application to microbial communities analysis.

机构信息

Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, Shandong, China.

School of Mathematics and Statistics and Research Institute of Mathematical Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China.

出版信息

Stat Med. 2021 Jul 10;40(15):3499-3515. doi: 10.1002/sim.8979. Epub 2021 Apr 11.

Abstract

Microbial communities analysis is drawing growing attention due to the rapid development fire of high-throughput sequencing techniques nowadays. The observed data has the following typical characteristics: it is high-dimensional, compositional (lying in a simplex) and even would be leptokurtic and highly skewed due to the existence of overly abundant taxa, which makes the conventional correlation analysis infeasible to study the co-occurrence and co-exclusion relationship between microbial taxa. In this article, we address the challenges of covariance estimation for this kind of data. Assuming the basis covariance matrix lying in a well-recognized class of sparse covariance matrices, we adopt a proxy matrix known as centered log-ratio covariance matrix in the literature. We construct a Median-of-Means estimator for the centered log-ratio covariance matrix and propose a thresholding procedure that is adaptive to the variability of individual entries. By imposing a much weaker finite fourth moment condition compared with the sub-Gaussianity condition in the literature, we derive the optimal rate of convergence under the spectral norm. In addition, we also provide theoretical guarantee on support recovery. The adaptive thresholding procedure of the MOM estimator is easy to implement and gains robustness when outliers or heavy-tailedness exist. Thorough simulation studies are conducted to show the advantages of the proposed procedure over some state-of-the-arts methods. At last, we apply the proposed method to analyze a microbiome dataset in human gut.

摘要

由于高通量测序技术的快速发展,微生物群落分析越来越受到关注。观测数据具有以下典型特征:它是高维的、组成的(位于单形体内),甚至由于过度丰富的分类存在,会出现尖峰和高度偏态,这使得传统的相关分析方法无法研究微生物分类之间的共现和互斥关系。在本文中,我们解决了这种数据的协方差估计的挑战。假设基础协方差矩阵位于一个公认的稀疏协方差矩阵类中,我们采用文献中称为中心对数比协方差矩阵的代理矩阵。我们为中心对数比协方差矩阵构建了一个中位数均值估计量,并提出了一种自适应于各个条目变异性的阈值处理程序。通过施加比文献中的次高斯条件弱得多的有限四阶矩条件,我们在谱范数下推导出最优的收敛速度。此外,我们还提供了关于支持恢复的理论保证。MOM 估计量的自适应阈值处理程序易于实现,并且在存在离群值或重尾时具有稳健性。我们进行了彻底的模拟研究,以显示所提出的方法相对于一些最先进的方法的优势。最后,我们将所提出的方法应用于分析人类肠道中的微生物组数据集。

相似文献

3
Inference for High-dimensional Differential Correlation Matrices.高维差分相关矩阵的推断
J Multivar Anal. 2016 Jan 1;143:107-126. doi: 10.1016/j.jmva.2015.08.019.
7
gCoda: Conditional Dependence Network Inference for Compositional Data.gCoda:成分数据的条件依赖网络推断
J Comput Biol. 2017 Jul;24(7):699-708. doi: 10.1089/cmb.2017.0054. Epub 2017 May 10.
8
Robust estimation of high-dimensional covariance and precision matrices.高维协方差矩阵和精度矩阵的稳健估计。
Biometrika. 2018 Jun 1;105(2):271-284. doi: 10.1093/biomet/asy011. Epub 2018 Mar 27.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验