用于具有成对对数比率的成分数据的监督学习的三种方法。

Three approaches to supervised learning for compositional data with pairwise logratios.

作者信息

Coenders Germà, Greenacre Michael

机构信息

Department of Economics, Universitat de Girona, Girona, Spain.

Department of Economics and Business and Barcelona School of Management, Universitat Pompeu Fabra, Barcelona, Spain.

出版信息

J Appl Stat. 2022 Aug 6;50(16):3272-3293. doi: 10.1080/02664763.2022.2108007. eCollection 2023.

DOI:10.1080/02664763.2022.2108007

PMID:37969895

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10637191/

Abstract

Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in compositional data analysis, and include the well-known additive logratios as particular cases. When the number of parts is large (sometimes even larger than the number of cases), some form of logratio selection is needed. In this article, we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios overlap, but it leads to the most accurate predictions. The second method restricts parts to occur only once, which makes the corresponding logratios intuitively interpretable. The third method uses additive logratios, so that -1 selected logratios involve a -part subcomposition. Our approach allows logratios or non-compositional covariates to be forced into the models based on theoretical knowledge, and various stopping criteria are available based on information measures or statistical significance with the Bonferroni correction. We present an application on a dataset from a study predicting Crohn's disease.

摘要

在成分数据分析中，组成部分对之间的对数比率（成对对数比率）最易于解释，并且包括作为特殊情况的著名的加法对数比率。当部分的数量很大时（有时甚至大于案例的数量），就需要某种形式的对数比率选择。在本文中，我们提出了三种替代的逐步监督学习方法，以选择能在广义线性模型中最佳解释因变量的成对对数比率，每种方法都针对特定问题。第一种方法的特点是无限制搜索，即可以选择任何成对对数比率。如果对数比率中的某些部分对重叠，这种方法的解释会很复杂，但它能带来最准确的预测。第二种方法限制部分只出现一次，这使得相应的对数比率在直观上易于解释。第三种方法使用加法对数比率，这样选择的(r)个对数比率涉及一个(r)部分子成分。我们的方法允许基于理论知识将对数比率或非成分协变量强制纳入模型，并且基于信息度量或经邦费罗尼校正的统计显著性有各种停止标准。我们展示了对一个预测克罗恩病的研究数据集的应用。

相似文献

Three approaches to supervised learning for compositional data with pairwise logratios.用于具有成对对数比率的成分数据的监督学习的三种方法。

J Appl Stat. 2022 Aug 6;50(16):3272-3293. doi: 10.1080/02664763.2022.2108007. eCollection 2023.

Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation.微生物组与任意组学数据集的成分数据分析：加法对数比变换的验证

Front Microbiol. 2021 Oct 11;12:727398. doi: 10.3389/fmicb.2021.727398. eCollection 2021.

Analysing body composition as compositional data: An exploration of the relationship between body composition, body mass and bone strength.分析身体成分作为组成数据：身体成分、体重和骨强度之间关系的探索。

Stat Methods Med Res. 2021 Jan;30(1):331-346. doi: 10.1177/0962280220955221. Epub 2020 Sep 17.

A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data.一种用于检测高维宏基因组数据中稀疏关联的同时特征选择与成分关联测试

Front Microbiol. 2022 Mar 21;13:837396. doi: 10.3389/fmicb.2022.837396. eCollection 2022.

Modeling and visualizing two-way contingency tables using compositional data analysis: A case-study on individual self-prediction of migraine days.使用成分数据分析对双向列联表进行建模和可视化：关于偏头痛天数个体自我预测的案例研究

Stat Med. 2021 Jan 30;40(2):213-225. doi: 10.1002/sim.8769. Epub 2020 Oct 28.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

[Multiple nonlinear statistical method of population genetic structure based on the allelic polymorphism data].基于等位基因多态性数据的群体遗传结构多重非线性统计方法

Yi Chuan Xue Bao. 2004 Feb;31(2):202-11.

The concept of compositional data analysis in practice--total major element concentrations in agricultural and grazing land soils of Europe.实践中的组合数据分析概念--欧洲农业和放牧地土壤的主要元素总浓度。

Sci Total Environ. 2012 Jun 1;426:196-210. doi: 10.1016/j.scitotenv.2012.02.032. Epub 2012 Apr 12.

Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers.利用有监督大数据分类器在相关酵母蛋白质组中检测直系同源物时，对无比对特征进行普查。

BMC Bioinformatics. 2018 May 3;19(1):166. doi: 10.1186/s12859-018-2148-8.

Compositional data in neuroscience: If you've got it, log it!神经科学中的成分数据：如果你得到了它，就记录下来！

J Neurosci Methods. 2016 Sep 15;271:154-9. doi: 10.1016/j.jneumeth.2016.07.008. Epub 2016 Jul 20.

引用本文的文献

Identifying Important Pairwise Logratios in Compositional Data with Sparse Principal Component Analysis.使用稀疏主成分分析识别成分数据中的重要成对对数比率

Math Geosci. 2025;57(2):333-358. doi: 10.1007/s11004-024-10159-0. Epub 2024 Oct 10.

A toolbox of machine learning software to support microbiome analysis.一个支持微生物组分析的机器学习软件工具箱。

Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023.

Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies.植物微生物组研究中机器学习算法面临的组合性、稀疏性、虚假异质性和其他数据驱动挑战。

Curr Opin Plant Biol. 2023 Feb;71:102326. doi: 10.1016/j.pbi.2022.102326. Epub 2022 Dec 18.

本文引用的文献

Supervised learning and model analysis with compositional data.基于组合数据的监督学习和模型分析。

PLoS Comput Biol. 2023 Jun 30;19(6):e1011240. doi: 10.1371/journal.pcbi.1011240. eCollection 2023 Jun.

coda4microbiome: compositional data analysis for microbiome cross-sectional and longitudinal studies.coda4microbiome：微生物组横断面和纵向研究的组成数据分析。

BMC Bioinformatics. 2023 Mar 6;24(1):82. doi: 10.1186/s12859-023-05205-3.

It's all relative: Regression analysis with compositional predictors.一切都是相对的：具有组合预测因子的回归分析。

Biometrics. 2023 Jun;79(2):1318-1329. doi: 10.1111/biom.13703. Epub 2022 Jul 11.

Approximation of a Microbiome Composition Shift by a Change in a Single Balance Between Two Groups of Taxa.两组分类群间单一平衡变化引起的微生物群落组成偏移的逼近。

mSystems. 2022 Jun 28;7(3):e0015522. doi: 10.1128/msystems.00155-22. Epub 2022 May 9.

Front Microbiol. 2022 Mar 21;13:837396. doi: 10.3389/fmicb.2022.837396. eCollection 2022.

Fatty acid ratio analysis identifies changes in competent meroplanktonic larvae sampled over different supply events.脂肪酸比率分析可识别在不同供应事件中采集的有能力的浮游幼体的变化。

Mar Environ Res. 2022 Jan;173:105517. doi: 10.1016/j.marenvres.2021.105517. Epub 2021 Nov 6.

Front Microbiol. 2021 Oct 11;12:727398. doi: 10.3389/fmicb.2021.727398. eCollection 2021.

Learning sparse log-ratios for high-throughput sequencing data.学习高通量测序数据的稀疏对数比。

Bioinformatics. 2021 Dec 22;38(1):157-163. doi: 10.1093/bioinformatics/btab645.

Sparse least trimmed squares regression with compositional covariates for high-dimensional data.基于成分协变量的高维数据稀疏最小 trimmed 方回归。

Bioinformatics. 2021 Nov 5;37(21):3805-3814. doi: 10.1093/bioinformatics/btab572.

DisBalance: a platform to automatically build balance-based disease prediction models and discover microbial biomarkers from microbiome data.DisBalance：一个自动构建基于平衡的疾病预测模型并从微生物组数据中发现微生物生物标志物的平台。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab094.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验