Suppr超能文献

置换单调矩阵模型中的最优置换恢复

Optimal Permutation Recovery in Permuted Monotone Matrix Model.

作者信息

Ma Rong, Cai T Tony, Li Hongzhe

机构信息

University of Pennsylvania School of Medicine, 215 Blockley Hall, Philadelphia, 19104 United States.

The Wharton School - Univ of Pennsylvania, Philadelphia, 19104 United States Rong Ma is PhD Candidate, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104.

出版信息

J Am Stat Assoc. 2021;116(535):1358-1372. doi: 10.1080/01621459.2020.1713794. Epub 2020 Feb 18.

Abstract

Motivated by recent research on quantifying bacterial growth dynamics based on genome assemblies, we consider a permuted monotone matrix model = ΘΠ+ , where the rows represent different samples, the columns represent contigs in genome assemblies and the elements represent log-read counts after preprocessing steps and Guanine-Cytosine (GC) adjustment. In this model, Θ is an unknown mean matrix with monotone entries for each row, Π is a permutation matrix that permutes the columns of Θ, and is a noise matrix. This paper studies the problem of estimation/recovery of Π given the observed noisy matrix . We propose an estimator based on the best linear projection, which is shown to be minimax rate-optimal for both exact recovery, as measured by the 0-1 loss, and partial recovery, as quantified by the normalized Kendall's tau distance. Simulation studies demonstrate the superior empirical performance of the proposed estimator over alternative methods. We demonstrate the methods using a synthetic metagenomics dataset of 45 closely related bacterial species and a real metagenomic dataset to compare the bacterial growth dynamics between the responders and the non-responders of the IBD patients after 8 weeks of treatment.

摘要

受近期基于基因组组装量化细菌生长动态研究的启发,我们考虑一个置换单调矩阵模型(Y = \Theta\Pi + E),其中行代表不同样本,列代表基因组组装中的重叠群,元素代表预处理步骤和鸟嘌呤 - 胞嘧啶(GC)调整后的对数读数计数。在此模型中,(\Theta)是一个未知的均值矩阵,每行元素单调,(\Pi)是一个置换矩阵,用于置换(\Theta)的列,(E)是一个噪声矩阵。本文研究在给定观测到的噪声矩阵(Y)的情况下估计/恢复(\Pi)的问题。我们提出一种基于最佳线性投影的估计器,对于精确恢复(以0 - 1损失衡量)和部分恢复(以归一化肯德尔tau距离量化),该估计器均被证明是极小极大率最优的。模拟研究表明,所提出的估计器在经验性能上优于其他方法。我们使用45种密切相关细菌物种的合成宏基因组数据集和一个真实宏基因组数据集来展示这些方法,以比较IBD患者在治疗8周后应答者和非应答者之间的细菌生长动态。

相似文献

1
Optimal Permutation Recovery in Permuted Monotone Matrix Model.置换单调矩阵模型中的最优置换恢复
J Am Stat Assoc. 2021;116(535):1358-1372. doi: 10.1080/01621459.2020.1713794. Epub 2020 Feb 18.
5
Permutation Tests for General Dependent Truncation.一般相依截断的排列检验
Comput Stat Data Anal. 2018 Dec;128:308-324. doi: 10.1016/j.csda.2018.07.012. Epub 2018 Jul 29.
7
Minimax Estimation of Functionals of Discrete Distributions.离散分布泛函的极小极大估计
IEEE Trans Inf Theory. 2015 May;61(5):2835-2885. doi: 10.1109/tit.2015.2412945. Epub 2015 Mar 13.

引用本文的文献

本文引用的文献

3
Measurement of bacterial replication rates in microbial communities.微生物群落中细菌复制速率的测量。
Nat Biotechnol. 2016 Dec;34(12):1256-1263. doi: 10.1038/nbt.3704. Epub 2016 Nov 7.
8
Sequence tag-based analysis of microbial population dynamics.基于序列标签的微生物种群动态分析。
Nat Methods. 2015 Mar;12(3):223-6, 3 p following 226. doi: 10.1038/nmeth.3253. Epub 2015 Jan 19.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验