Kenney Toby, Gu Hong, Huang Tianshu
Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.
Biometrics. 2021 Dec;77(4):1369-1384. doi: 10.1111/biom.13384. Epub 2020 Oct 19.
In this paper, we study the problem of computing a principal component analysis of data affected by Poisson noise. We assume samples are drawn from independent Poisson distributions. We want to estimate principal components of a fixed transformation of the latent Poisson means. Our motivating example is microbiome data, though the methods apply to many other situations. We develop a semiparametric approach to correct the bias of variance estimators, both for untransformed and transformed (with particular attention to log-transformation) Poisson means. Furthermore, we incorporate methods for correcting different exposure or sequencing depth in the data. In addition to identifying the principal components, we also address the nontrivial problem of computing the principal scores in this semiparametric framework. Most previous approaches tend to take a more parametric line: for example, fitting a log-normal Poisson (PLN) model. We compare our method with the PLN approach and find that in many cases our method is better at identifying the main principal components of the latent log-transformed Poisson means, and as a further major advantage, takes far less time to compute. Comparing methods on real and simulated data, we see that our method also appears to be more robust to outliers than the parametric method.
在本文中,我们研究了对受泊松噪声影响的数据进行主成分分析的计算问题。我们假设样本是从独立的泊松分布中抽取的。我们想要估计潜在泊松均值的固定变换的主成分。我们的激励示例是微生物组数据,不过这些方法适用于许多其他情况。我们开发了一种半参数方法来校正方差估计量的偏差,该方法适用于未变换和变换后的(特别关注对数变换)泊松均值。此外,我们纳入了校正数据中不同暴露或测序深度的方法。除了识别主成分外,我们还解决了在这个半参数框架中计算主得分这一重要问题。大多数先前的方法倾向于采用更参数化的路线:例如,拟合对数正态泊松(PLN)模型。我们将我们的方法与PLN方法进行比较,发现在许多情况下,我们的方法在识别潜在对数变换泊松均值的主要主成分方面表现更好,并且作为一个更大的优势,计算所需时间要少得多。在真实数据和模拟数据上比较方法时,我们发现我们的方法似乎也比参数方法对异常值更具鲁棒性。