汞齐法：用于成分数据降维的数据驱动融合法。

Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data.

作者信息

Quinn Thomas P, Erb Ionas

机构信息

Applied Artificial Intelligence Institute, Deakin University, 75 Pigdons Rd, WaurnPonds VIC 3216, Geelong, Australia.

Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Carrer del Dr.Aiguader, 88, 08003, Barcelona, Spain.

出版信息

NAR Genom Bioinform. 2020 Oct 2;2(4):lqaa076. doi: 10.1093/nargab/lqaa076. eCollection 2020 Dec.

DOI:10.1093/nargab/lqaa076

PMID:33575624

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7671324/

Abstract

Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together.

摘要

由于生物和技术因素限制了给定样本中观察到的转录本总数，许多下一代测序数据集仅包含相对信息。不可能孤立地解释任何一个组成部分。基于对数比变换的相对数据替代方法催生了成分数据分析领域。然而，这些数据通常包含的特征比样本多得多，因此需要创新的新方法来降低数据的维度。部分的总和，称为合并，是一种降低维度的实用方法，但可能会给数据引入非线性失真。我们利用这种非线性提出了一种强大且可解释的降维方法，称为数据驱动合并。我们的新方法在用户友好的R包amalgam中实现，通过找到能（i）最佳地保留样本之间的距离，或（ii）将样本分类为患病或未患病的合并方式，来降低成分数据的维度。我们在13个真实数据集上的基准测试证实，这些合并在性能方面与最先进的方法竞争，但会产生易于理解的新特征：它们是相加在一起的部分组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45fa/7671324/c3c30e73c7ab/lqaa076fig1.jpg

相似文献

Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data.汞齐法：用于成分数据降维的数据驱动融合法。

NAR Genom Bioinform. 2020 Oct 2;2(4):lqaa076. doi: 10.1093/nargab/lqaa076. eCollection 2020 Dec.

Learning Latent Low-Rank and Sparse Embedding for Robust Image Feature Extraction.学习潜在低秩稀疏嵌入以实现稳健的图像特征提取。

IEEE Trans Image Process. 2020;29(1):2094-2107. doi: 10.1109/TIP.2019.2938859. Epub 2019 Sep 9.

Improved Interpretability of Brain-Behavior CCA With Domain-Driven Dimension Reduction.通过领域驱动的降维提高脑行为典型相关分析的可解释性

Front Neurosci. 2022 Jun 23;16:851827. doi: 10.3389/fnins.2022.851827. eCollection 2022.

Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection.用于健康生物标志物分类的可解释对数对比：一种平衡选择的新方法。

mSystems. 2020 Apr 7;5(2):e00230-19. doi: 10.1128/mSystems.00230-19.

Supervised dimensionality reduction for big data.大数据的监督降维

Nat Commun. 2021 May 17;12(1):2872. doi: 10.1038/s41467-021-23102-2.

Graph embedded nonparametric mutual information for supervised dimensionality reduction.基于图嵌入的非参数互信息监督降维方法。

IEEE Trans Neural Netw Learn Syst. 2015 May;26(5):951-63. doi: 10.1109/TNNLS.2014.2329240.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation.微生物组与任意组学数据集的成分数据分析：加法对数比变换的验证

Front Microbiol. 2021 Oct 11;12:727398. doi: 10.3389/fmicb.2021.727398. eCollection 2021.

Engineering Aspects of Olfaction嗅觉的工程学方面

A general soft label based linear discriminant analysis for semi-supervised dimensionality reduction.基于通用软标签的半监督降维线性判别分析。

Neural Netw. 2014 Jul;55:83-97. doi: 10.1016/j.neunet.2014.03.005. Epub 2014 Apr 13.

引用本文的文献

Unveiling the microbial realm with VEBA 2.0: a modular bioinformatics suite for end-to-end genome-resolved prokaryotic, (micro)eukaryotic and viral multi-omics from either short- or long-read sequencing.揭示微生物世界的 VEBA 2.0：一个用于从短读或长读测序中进行端到端基因组解析的原核生物、（微）真核生物和病毒多组学的模块化生物信息学套件。

Nucleic Acids Res. 2024 Aug 12;52(14):e63. doi: 10.1093/nar/gkae528.

Evaluation of bi-directional causal association between obstructive sleep apnoea syndrome and diabetic microangiopathy: a Mendelian randomization study.阻塞性睡眠呼吸暂停综合征与糖尿病微血管病变之间双向因果关联的评估：一项孟德尔随机化研究

Front Cardiovasc Med. 2024 May 9;11:1340602. doi: 10.3389/fcvm.2024.1340602. eCollection 2024.

Unveiling the Microbial Realm with VEBA 2.0: A modular bioinformatics suite for end-to-end genome-resolved prokaryotic, (micro)eukaryotic, and viral multi-omics from either short- or long-read sequencing.利用VEBA 2.0揭示微生物领域：一个模块化生物信息学套件，用于从短读长或长读长测序进行端到端的基因组解析原核生物、（微）真核生物和病毒多组学分析。

bioRxiv. 2024 Mar 11:2024.03.08.583560. doi: 10.1101/2024.03.08.583560.

A toolbox of machine learning software to support microbiome analysis.一个支持微生物组分析的机器学习软件工具箱。

Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023.

Three approaches to supervised learning for compositional data with pairwise logratios.用于具有成对对数比率的成分数据的监督学习的三种方法。

J Appl Stat. 2022 Aug 6;50(16):3272-3293. doi: 10.1080/02664763.2022.2108007. eCollection 2023.

Oral mucosal breaks trigger anti-citrullinated bacterial and human protein antibody responses in rheumatoid arthritis.口腔黏膜破损可引发类风湿关节炎患者抗瓜氨酸化细菌和人体蛋白抗体应答。

Sci Transl Med. 2023 Feb 22;15(684):eabq8476. doi: 10.1126/scitranslmed.abq8476.

Principal Amalgamation Analysis for Microbiome Data.微生物组数据的主成分融合分析。

Genes (Basel). 2022 Jun 24;13(7):1139. doi: 10.3390/genes13071139.

Approximation of a Microbiome Composition Shift by a Change in a Single Balance Between Two Groups of Taxa.两组分类群间单一平衡变化引起的微生物群落组成偏移的逼近。

mSystems. 2022 Jun 28;7(3):e0015522. doi: 10.1128/msystems.00155-22. Epub 2022 May 9.

A prospective investigation into the association between the gut microbiome composition and cognitive performance among healthy young adults.一项关于健康年轻成年人肠道微生物群组成与认知表现之间关联的前瞻性调查。

Gut Pathog. 2022 Apr 19;14(1):15. doi: 10.1186/s13099-022-00487-z.

Learning sparse log-ratios for high-throughput sequencing data.学习高通量测序数据的稀疏对数比。

Bioinformatics. 2021 Dec 22;38(1):157-163. doi: 10.1093/bioinformatics/btab645.

本文引用的文献

mSystems. 2020 Apr 7;5(2):e00230-19. doi: 10.1128/mSystems.00230-19.

Statistical Analysis of Metagenomics Data.宏基因组学数据的统计分析

Genomics Inform. 2019 Mar;17(1):e6. doi: 10.5808/GI.2019.17.1.e6. Epub 2019 Mar 31.

Gut microbiome structure and metabolic activity in inflammatory bowel disease.炎症性肠病中的肠道微生物组结构和代谢活性。

Nat Microbiol. 2019 Feb;4(2):293-305. doi: 10.1038/s41564-018-0306-4. Epub 2018 Dec 10.

Visualizing balances of compositional data: A new alternative to balance dendrograms.可视化成分数据的平衡：平衡树状图的一种新替代方法。

F1000Res. 2018 Aug 14;7:1278. doi: 10.12688/f1000research.15858.1. eCollection 2018.

Balances: a New Perspective for Microbiome Analysis.平衡：微生物组分析的新视角

mSystems. 2018 Jul 17;3(4). doi: 10.1128/mSystems.00053-18. eCollection 2018 Jul-Aug.

Understanding sequencing data as compositions: an outlook and review.理解测序数据作为组成：展望与回顾。

Bioinformatics. 2018 Aug 15;34(16):2870-2878. doi: 10.1093/bioinformatics/bty175.

: an R-package for the rapid implementation of machine learning algorithms.用于快速实现机器学习算法的R包。

F1000Res. 2016 Oct 27;5:2588. doi: 10.12688/f1000research.9893.2. eCollection 2016.

Meta-analysis of gut microbiome studies identifies disease-specific and shared responses.基于宏基因组关联研究的肠道微生物组分析鉴定出疾病特异性和共享反应。

Nat Commun. 2017 Dec 5;8(1):1784. doi: 10.1038/s41467-017-01973-8.

Microbiome Datasets Are Compositional: And This Is Not Optional.微生物组数据集具有构成性：这并非可有可无。

Front Microbiol. 2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. eCollection 2017.

propr: An R-package for Identifying Proportionally Abundant Features Using Compositional Data Analysis.propr：一个使用成分数据分析识别比例丰富特征的 R 包。

Sci Rep. 2017 Nov 24;7(1):16252. doi: 10.1038/s41598-017-16520-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

汞齐法：用于成分数据降维的数据驱动融合法。

Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献