Suppr超能文献

分区:一种用于降维的满射映射方法。

Partition: a surjective mapping approach for dimensionality reduction.

机构信息

Department of Preventive Medicine, CA 90033, USA.

Department of Medicine, Division of Medical Oncology, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA.

出版信息

Bioinformatics. 2020 Feb 1;36(3):676-681. doi: 10.1093/bioinformatics/btz661.

Abstract

MOTIVATION

Large amounts of information generated by genomic technologies are accompanied by statistical and computational challenges due to redundancy, badly behaved data and noise. Dimensionality reduction (DR) methods have been developed to mitigate these challenges. However, many approaches are not scalable to large dimensions or result in excessive information loss.

RESULTS

The proposed approach partitions data into subsets of related features and summarizes each into one and only one new feature, thus defining a surjective mapping. A constraint on information loss determines the size of the reduced dataset. Simulation studies demonstrate that when multiple related features are associated with a response, this approach can substantially increase the number of true associations detected as compared to principal components analysis, non-negative matrix factorization or no DR. This increase in true discoveries is explained both by a reduced multiple-testing challenge and a reduction in extraneous noise. In an application to real data collected from metastatic colorectal cancer tumors, more associations between gene expression features and progression free survival and response to treatment were detected in the reduced than in the full untransformed dataset.

AVAILABILITY AND IMPLEMENTATION

Freely available R package from CRAN, https://cran.r-project.org/package=partition.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

由于冗余、数据质量差和噪声等问题,基因组技术产生的大量信息伴随着统计和计算方面的挑战。降维(DR)方法的发展是为了缓解这些挑战。然而,许多方法无法扩展到大规模维度,或者会导致过多的信息丢失。

结果

所提出的方法将数据划分为相关特征的子集,并将每个子集总结为一个且仅一个新特征,从而定义了一个满射映射。信息丢失的约束确定了降维数据集的大小。模拟研究表明,当多个相关特征与响应相关时,与主成分分析、非负矩阵分解或无 DR 相比,该方法可以大大增加检测到的真实关联数量。这种真实发现的增加既可以通过减少多重检验挑战来解释,也可以通过减少无关噪声来解释。在对转移性结直肠癌肿瘤中收集的真实数据的应用中,在降维后数据集而不是完整的未转换数据集中检测到了更多基因表达特征与无进展生存期和对治疗的反应之间的关联。

可用性和实现

可从 CRAN 上的免费 R 包获得,https://cran.r-project.org/package=partition。

补充信息

补充资料可在生物信息学在线获得。

相似文献

2
cit: hypothesis testing software for mediation analysis in genomic applications.引用:基因组应用中介分析的假设检验软件。
Bioinformatics. 2016 Aug 1;32(15):2364-5. doi: 10.1093/bioinformatics/btw135. Epub 2016 Mar 9.

本文引用的文献

2
Integrative omics for health and disease.整体医学组学与健康和疾病。
Nat Rev Genet. 2018 May;19(5):299-310. doi: 10.1038/nrg.2018.4. Epub 2018 Feb 26.
4
The genetics of Parkinson disease.帕金森病的遗传学。
Ageing Res Rev. 2018 Mar;42:72-85. doi: 10.1016/j.arr.2017.12.007. Epub 2017 Dec 26.
6
The Role of PIAS SUMO E3-Ligases in Cancer.PIAS类小泛素样修饰物E3连接酶在癌症中的作用。
Cancer Res. 2017 Apr 1;77(7):1542-1547. doi: 10.1158/0008-5472.CAN-16-2958. Epub 2017 Mar 22.
10
Genomic analyses identify molecular subtypes of pancreatic cancer.基因组分析确定了胰腺癌的分子亚型。
Nature. 2016 Mar 3;531(7592):47-52. doi: 10.1038/nature16965. Epub 2016 Feb 24.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验