Liu J, Zhang Xinlian, Chen T, Wu T, Lin T, Jiang L, Lang S, Liu L, Natarajan L, Tu J X, Kosciolek T, Morton J, Nguyen T T, Schnabl B, Knight R, Feng C, Zhong Y, Tu X M
Department of Family Medicine and Public Health, UC San Diego, San Diego, California, USA.
Stein Institute for Research on Aging, UC San Diego, San Diego, California, USA.
Biometrics. 2022 Sep;78(3):950-962. doi: 10.1111/biom.13487. Epub 2021 Jun 8.
The human microbiome plays an important role in our health and identifying factors associated with microbiome composition provides insights into inherent disease mechanisms. By amplifying and sequencing the marker genes in high-throughput sequencing, with highly similar sequences binned together, we obtain operational taxonomic units (OTUs) profiles for each subject. Due to the high-dimensionality and nonnormality features of the OTUs, the measure of diversity is introduced as a summarization at the microbial community level, including the distance-based beta-diversity between individuals. Analyses of such between-subject attributes are not amenable to the predominant within-subject-based statistical paradigm, such as t-tests and linear regression. In this paper, we propose a new approach to model beta-diversity as a response within a regression setting by utilizing the functional response models (FRMs), a class of semiparametric models for between- as well as within-subject attributes. The new approach not only addresses limitations of current methods for beta-diversity with cross-sectional data, but also provides a premise for extending the approach to longitudinal and other clustered data in the future. The proposed approach is illustrated with both real and simulated data.
人类微生物组在我们的健康中起着重要作用,识别与微生物组组成相关的因素有助于深入了解内在的疾病机制。通过在高通量测序中对标记基因进行扩增和测序,将高度相似的序列归为一组,我们获得了每个受试者的操作分类单元(OTU)图谱。由于OTU具有高维度和非正态性特征,引入多样性度量作为微生物群落水平的一种汇总,包括个体间基于距离的β多样性。对这种受试者间属性的分析不适用于以受试者内为主的统计范式,如t检验和线性回归。在本文中,我们提出了一种新方法,通过利用功能响应模型(FRM),一类用于受试者间和受试者内属性的半参数模型,将β多样性作为回归设置中的响应进行建模。这种新方法不仅解决了当前横断面数据β多样性方法的局限性,还为未来将该方法扩展到纵向数据和其他聚类数据提供了前提。通过真实数据和模拟数据对所提出的方法进行了说明。