Suppr超能文献

泊松主成分分析:泊松测量误差校正的主成分分析及其在微生物组数据中的应用

Poisson PCA: Poisson measurement error corrected PCA, with application to microbiome data.

作者信息

Kenney Toby, Gu Hong, Huang Tianshu

机构信息

Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.

出版信息

Biometrics. 2021 Dec;77(4):1369-1384. doi: 10.1111/biom.13384. Epub 2020 Oct 19.

Abstract

In this paper, we study the problem of computing a principal component analysis of data affected by Poisson noise. We assume samples are drawn from independent Poisson distributions. We want to estimate principal components of a fixed transformation of the latent Poisson means. Our motivating example is microbiome data, though the methods apply to many other situations. We develop a semiparametric approach to correct the bias of variance estimators, both for untransformed and transformed (with particular attention to log-transformation) Poisson means. Furthermore, we incorporate methods for correcting different exposure or sequencing depth in the data. In addition to identifying the principal components, we also address the nontrivial problem of computing the principal scores in this semiparametric framework. Most previous approaches tend to take a more parametric line: for example, fitting a log-normal Poisson (PLN) model. We compare our method with the PLN approach and find that in many cases our method is better at identifying the main principal components of the latent log-transformed Poisson means, and as a further major advantage, takes far less time to compute. Comparing methods on real and simulated data, we see that our method also appears to be more robust to outliers than the parametric method.

摘要

在本文中,我们研究了对受泊松噪声影响的数据进行主成分分析的计算问题。我们假设样本是从独立的泊松分布中抽取的。我们想要估计潜在泊松均值的固定变换的主成分。我们的激励示例是微生物组数据,不过这些方法适用于许多其他情况。我们开发了一种半参数方法来校正方差估计量的偏差,该方法适用于未变换和变换后的(特别关注对数变换)泊松均值。此外,我们纳入了校正数据中不同暴露或测序深度的方法。除了识别主成分外,我们还解决了在这个半参数框架中计算主得分这一重要问题。大多数先前的方法倾向于采用更参数化的路线:例如,拟合对数正态泊松(PLN)模型。我们将我们的方法与PLN方法进行比较,发现在许多情况下,我们的方法在识别潜在对数变换泊松均值的主要主成分方面表现更好,并且作为一个更大的优势,计算所需时间要少得多。在真实数据和模拟数据上比较方法时,我们发现我们的方法似乎也比参数方法对异常值更具鲁棒性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验