Suppr超能文献

宏基因组数据多组学整合的组合对组合回归分析

Composition-on-composition regression analysis for multi-omics integration of metagenomic data.

作者信息

Rios Nicholas, Shi Yuke, Chen Jun, Zhan Xiang, Xue Lingzhou, Li Qizhai

机构信息

Department of Statistics, George Mason University, Fairfax, VA 22030, United States.

State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.

出版信息

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf387.

Abstract

MOTIVATION

Compositional data are frequently encountered in many disciplines, such as in next-generation sequencing experiments widely used in biomedical studies. Regression analysis with compositional data as either responses or predictors has been well studied. However, when both responses and predictors are compositional, the inventory of analysis tools is surprisingly limited, especially in the high-dimensional setting. Among the few existing methods, most of them rely on a log-ratio transformation to move compositional data from the simplex to real numbers. Yet, a serious weakness of these methods is their failure to handle the substantial fraction of zeroes observed in data collected from next-generation sequencing experiments.

RESULTS

To investigate associations between two high-dimensional multi-omics compositions, we propose a composition-on-composition (COC) regression analysis method which does not require log-ratio transformations and hence can handle zeroes in the data. To account for high dimensionality, we estimate regression coefficients using a penalized estimation equation approach. Finally, inference procedures for COC regression are also proposed. Superior performance of COC is demonstrated through both comprehensive numerical simulations and case studies.

AVAILABILITY AND IMPLEMENTATION

Source R codes to implement COC method is available at https://github.com/nrios4/COC.

摘要

动机

成分数据在许多学科中经常遇到,例如在生物医学研究中广泛使用的下一代测序实验中。以成分数据作为响应变量或预测变量的回归分析已经得到了充分研究。然而,当响应变量和预测变量都是成分数据时,分析工具的种类出人意料地有限,尤其是在高维情况下。在现有的少数几种方法中,大多数都依赖于对数比变换,以便将成分数据从单纯形转换为实数。然而,这些方法的一个严重缺点是它们无法处理从下一代测序实验收集的数据中观察到的大量零值。

结果

为了研究两个高维多组学成分之间的关联,我们提出了一种成分对成分(COC)回归分析方法,该方法不需要对数比变换,因此可以处理数据中的零值。为了考虑高维性,我们使用惩罚估计方程方法估计回归系数。最后,还提出了COC回归的推断程序。通过全面的数值模拟和案例研究证明了COC的优越性能。

可用性和实现

实现COC方法的R代码可在https://github.com/nrios4/COC获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dfc2/12279295/0803b0063ad3/btaf387f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验