Suppr超能文献

恢复距离矩阵主成分与预测变量线性组合之间的对偶性及其在微生物组研究中的应用

Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome.

作者信息

Satten Glen A, Tyx Robert E, Rivera Angel J, Stanfill Stephen

机构信息

Division of Reproductive Health, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States of America.

Division of Laboratory Sciences, National Center for Environmental Health, Centers for Disease Control and Prevention, Atlanta, GA, United States of America.

出版信息

PLoS One. 2017 Jan 13;12(1):e0168131. doi: 10.1371/journal.pone.0168131. eCollection 2017.

Abstract

Appreciation of the importance of the microbiome is increasing, as sequencing technology has made it possible to ascertain the microbial content of a variety of samples. Studies that sequence the 16S rRNA gene, ubiquitous in and nearly exclusive to bacteria, have proliferated in the medical literature. After sequences are binned into operational taxonomic units (OTUs) or species, data from these studies are summarized in a data matrix with the observed counts from each OTU for each sample. Analysis often reduces these data further to a matrix of pairwise distances or dissimilarities; plotting the first two or three principal components (PCs) of this distance matrix often reveals meaningful groupings in the data. However, once the distance matrix is calculated, it is no longer clear which OTUs or species are important to the observed clustering; further, the PCs are hard to interpret and cannot be calculated for subsequent observations. We show how to construct approximate decompositions of the data matrix that pair PCs with linear combinations of OTU or species frequencies, and show how these decompositions can be used to construct biplots, select important OTUs and partition the variability in the data matrix into contributions corresponding to PCs of an arbitrary distance or dissimilarity matrix. To illustrate our approach, we conduct an analysis of the bacteria found in 45 smokeless tobacco samples.

摘要

随着测序技术使确定各种样本中的微生物含量成为可能,人们对微生物组重要性的认识正在不断提高。对16S rRNA基因进行测序的研究在医学文献中大量涌现,该基因在细菌中普遍存在且几乎为细菌所特有。在将序列分类为操作分类单元(OTU)或物种后,这些研究的数据会汇总到一个数据矩阵中,其中包含每个样本中每个OTU的观测计数。分析通常会进一步将这些数据简化为成对距离或差异的矩阵;绘制该距离矩阵的前两个或三个主成分(PC)通常会揭示数据中有意义的分组。然而,一旦计算出距离矩阵,就不再清楚哪些OTU或物种对观察到的聚类很重要;此外,主成分很难解释,并且无法为后续观测值计算。我们展示了如何构建数据矩阵的近似分解,将主成分与OTU或物种频率的线性组合配对,并展示了如何使用这些分解来构建双标图、选择重要的OTU,以及将数据矩阵中的变异性划分为与任意距离或差异矩阵的主成分相对应的贡献。为了说明我们的方法,我们对45个无烟烟草样本中发现的细菌进行了分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f2a/5234780/30453ad240bb/pone.0168131.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验