Suppr超能文献

稀疏对比主成分分析在高维生物学数据中的应用。

Exploring high-dimensional biological data with sparse contrastive principal component analysis.

机构信息

Graduate Group in Biostatistics.

Center for Computational Biology.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3422-3430. doi: 10.1093/bioinformatics/btaa176.

Abstract

MOTIVATION

Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously.

RESULTS

Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis (PCA), sparse contrastive PCA that extracts sparse, stable, interpretable and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study and via analyses of several publicly available protein expression, microarray gene expression and single-cell transcriptome sequencing datasets.

AVAILABILITY AND IMPLEMENTATION

A free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in this article is also available via GitHub.

CONTACT

philippe_boileau@berkeley.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量测序数据的统计分析已经改变了生物科学。尽管取得了无数的进展,但从被技术噪声污染的数据中恢复可解释的生物学信号仍然是一个普遍存在的开放性问题。几类程序,包括经典的降维技术和其他结合主题知识的程序,都提供了有效的进展。然而,目前没有一种程序能够同时满足恢复稳定和相关特征的双重目标。

结果

受最近提出的利用对照数据去除不需要的变化的启发,我们提出了一种主成分分析(PCA)的变体,稀疏对比 PCA,它可以提取稀疏、稳定、可解释和相关的生物学信号。通过模拟研究和对几个公开可用的蛋白质表达、微阵列基因表达和单细胞转录组测序数据集的分析,将新方法与竞争的降维方法进行了比较。

可用性和实现

该方法的免费和开源软件实现,scPCA R 包,可通过 Bioconductor 项目获得。本文中所有分析的代码也可通过 GitHub 获得。

联系信息

philippe_boileau@berkeley.edu

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验