Suppr超能文献

汞齐法:用于成分数据降维的数据驱动融合法。

Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data.

作者信息

Quinn Thomas P, Erb Ionas

机构信息

Applied Artificial Intelligence Institute, Deakin University, 75 Pigdons Rd, WaurnPonds VIC 3216, Geelong, Australia.

Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Carrer del Dr.Aiguader, 88, 08003, Barcelona, Spain.

出版信息

NAR Genom Bioinform. 2020 Oct 2;2(4):lqaa076. doi: 10.1093/nargab/lqaa076. eCollection 2020 Dec.

Abstract

Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together.

摘要

由于生物和技术因素限制了给定样本中观察到的转录本总数,许多下一代测序数据集仅包含相对信息。不可能孤立地解释任何一个组成部分。基于对数比变换的相对数据替代方法催生了成分数据分析领域。然而,这些数据通常包含的特征比样本多得多,因此需要创新的新方法来降低数据的维度。部分的总和,称为合并,是一种降低维度的实用方法,但可能会给数据引入非线性失真。我们利用这种非线性提出了一种强大且可解释的降维方法,称为数据驱动合并。我们的新方法在用户友好的R包amalgam中实现,通过找到能(i)最佳地保留样本之间的距离,或(ii)将样本分类为患病或未患病的合并方式,来降低成分数据的维度。我们在13个真实数据集上的基准测试证实,这些合并在性能方面与最先进的方法竞争,但会产生易于理解的新特征:它们是相加在一起的部分组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45fa/7671324/c3c30e73c7ab/lqaa076fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验