CCLasso：通过套索法对成分数据进行相关性推断

CCLasso: correlation inference for compositional data through Lasso.

作者信息

Fang Huaying, Huang Chengcheng, Zhao Hongyu, Deng Minghua

机构信息

LMAN, School of Mathematical Sciences, Beijing International Center for Mathematical Research, Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China.

College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China.

出版信息

Bioinformatics. 2015 Oct 1;31(19):3172-80. doi: 10.1093/bioinformatics/btv349. Epub 2015 Jun 4.

DOI:10.1093/bioinformatics/btv349

PMID:26048598

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4693003/

Abstract

MOTIVATION

Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data.

RESULTS

In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with [Formula: see text] penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project.

AVAILABILITY AND IMPLEMENTATION

CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3.

CONTACT

dengmh@pku.edu.cn

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

由于用于16S rRNA基因谱分析的高通量测序技术的进步，对环境和人体中的微生物群落进行直接分析变得更加便捷和可靠。推断微生物群落成员之间的相关关系对于基因组调查研究至关重要。传统的Pearson相关分析将观测数据视为微生物的绝对丰度，可能会导致虚假结果，因为这些数据仅代表相对丰度。对于这些成分数据，在进行相关分析之前需要特别小心并采用适当的方法。

结果

在本文中，我们首先讨论了成分数据的潜在变量的相关定义。然后，我们提出了一种基于最小二乘法并带有[公式：见原文]惩罚项的名为CCLasso的新方法，用于从宏基因组数据推断成分数据潜在变量的相关网络。使用一种来自增广拉格朗日方法的有效交替方向算法来解决优化问题。模拟结果表明，CCLasso在成分数据的边恢复方面优于现有方法，例如SparCC。在估计人类微生物组计划中的微生物物种相关网络方面，它也与SparCC表现相当。

可用性与实现

CCLasso是开源的，可在https://github.com/huayingfang/CCLasso上根据GNU LGPL v3免费获取。

联系方式

dengmh@pku.edu.cn

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

CCLasso: correlation inference for compositional data through Lasso.

Bioinformatics. 2015 Oct 1;31(19):3172-80. doi: 10.1093/bioinformatics/btv349. Epub 2015 Jun 4.

gmcoda: Graphical model for multiple compositional vectors in microbiome studies.

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad700.

Compositional data network analysis via lasso penalized D-trace loss.

Bioinformatics. 2019 Sep 15;35(18):3404-3411. doi: 10.1093/bioinformatics/btz098.

Inference of Environmental Factor-Microbe and Microbe-Microbe Associations from Metagenomic Data Using a Hierarchical Bayesian Statistical Model.

Cell Syst. 2017 Jan 25;4(1):129-137.e5. doi: 10.1016/j.cels.2016.12.012.

fastCCLasso: a fast and efficient algorithm for estimating correlation matrix from compositional data.

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae314.

Direct interaction network inference for compositional data via codaloss.

J Bioinform Comput Biol. 2020 Dec;18(6):2050037. doi: 10.1142/S0219720020500377. Epub 2020 Oct 27.

Phylogeny-based classification of microbial communities.

Bioinformatics. 2014 Feb 15;30(4):449-56. doi: 10.1093/bioinformatics/btt700. Epub 2013 Dec 24.

gCoda: Conditional Dependence Network Inference for Compositional Data.

J Comput Biol. 2017 Jul;24(7):699-708. doi: 10.1089/cmb.2017.0054. Epub 2017 May 10.

Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

Microbiome. 2018 Mar 20;6(1):50. doi: 10.1186/s40168-018-0437-0.

Analysis and correction of compositional bias in sparse sequencing count data.

BMC Genomics. 2018 Nov 6;19(1):799. doi: 10.1186/s12864-018-5160-5.

引用本文的文献

Advanced computational tools, artificial intelligence and machine-learning approaches in gut microbiota and biomarker identification.

Front Med Technol. 2025 Apr 15;6:1434799. doi: 10.3389/fmedt.2024.1434799. eCollection 2024.

Cross-validation for training and testing co-occurrence network inference algorithms.

BMC Bioinformatics. 2025 Mar 6;26(1):74. doi: 10.1186/s12859-025-06083-7.

Evaluating changes in attractor sets under small network perturbations to infer reliable microbial interaction networks from abundance patterns.

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf095.

Non-differential gut microbes contribute to hypertension and its severity through co-abundances: A multi-regional prospective cohort study.

Imeta. 2025 Jan 10;4(1):e268. doi: 10.1002/imt2.268. eCollection 2025 Feb.

Advances in multi-omics integrated analysis methods based on the gut microbiome and their applications.

Front Microbiol. 2025 Jan 3;15:1509117. doi: 10.3389/fmicb.2024.1509117. eCollection 2024.

ISCAZIM: Integrated statistical correlation analysis for zero-inflated microbiome data.

Heliyon. 2024 Dec 18;11(1):e41184. doi: 10.1016/j.heliyon.2024.e41184. eCollection 2025 Jan 15.

OneNet-One network to rule them all: Consensus network inference from microbiome data.

PLoS Comput Biol. 2024 Dec 6;20(12):e1012627. doi: 10.1371/journal.pcbi.1012627. eCollection 2024 Dec.

A Survey of Statistical Methods for Microbiome Data Analysis.

Front Appl Math Stat. 2022;8. doi: 10.3389/fams.2022.884810. Epub 2022 Jun 13.

Application of fungal inoculants enhances colonization of secondary bacterial degraders during in situ paddy straw degradation: a genomic insights into cross-domain synergism.

Int Microbiol. 2025 Apr;28(4):703-720. doi: 10.1007/s10123-024-00570-2. Epub 2024 Aug 13.

Metagenome-enabled models improve genomic predictive ability and identification of herbivory-limiting genes in sweetpotato.

Hortic Res. 2024 May 10;11(7):uhae135. doi: 10.1093/hr/uhae135. eCollection 2024 Jul.

本文引用的文献

Differential abundance analysis for microbial marker-gene surveys.

Nat Methods. 2013 Dec;10(12):1200-2. doi: 10.1038/nmeth.2658. Epub 2013 Sep 29.

How much metagenomic sequencing is enough to achieve a given goal?

Sci Rep. 2013;3:1968. doi: 10.1038/srep01968.

Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis.

Biostatistics. 2013 Apr;14(2):244-58. doi: 10.1093/biostatistics/kxs038. Epub 2012 Oct 15.

Inferring correlation networks from genomic survey data.

PLoS Comput Biol. 2012;8(9):e1002687. doi: 10.1371/journal.pcbi.1002687. Epub 2012 Sep 20.

Microbial co-occurrence relationships in the human microbiome.

PLoS Comput Biol. 2012;8(7):e1002606. doi: 10.1371/journal.pcbi.1002606. Epub 2012 Jul 12.

A framework for human microbiome research.

Nature. 2012 Jun 13;486(7402):215-21. doi: 10.1038/nature11209.

Structure, function and diversity of the healthy human microbiome.

Nature. 2012 Jun 13;486(7402):207-14. doi: 10.1038/nature11234.

Microbial extremophiles at the limits of life.

Crit Rev Microbiol. 2007;33(3):183-209. doi: 10.1080/10408410701451948.

Metagenomic analysis of the human distal gut microbiome.

Science. 2006 Jun 2;312(5778):1355-9. doi: 10.1126/science.1124234.

Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products.

Chem Biol. 1998 Oct;5(10):R245-9. doi: 10.1016/s1074-5521(98)90108-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CCLasso：通过套索法对成分数据进行相关性推断

CCLasso: correlation inference for compositional data through Lasso.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性与实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献