稀疏对比主成分分析在高维生物学数据中的应用。

Exploring high-dimensional biological data with sparse contrastive principal component analysis.

机构信息

Graduate Group in Biostatistics.

Center for Computational Biology.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3422-3430. doi: 10.1093/bioinformatics/btaa176.

DOI:10.1093/bioinformatics/btaa176

PMID:32176249

Abstract

MOTIVATION

Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously.

RESULTS

Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis (PCA), sparse contrastive PCA that extracts sparse, stable, interpretable and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study and via analyses of several publicly available protein expression, microarray gene expression and single-cell transcriptome sequencing datasets.

AVAILABILITY AND IMPLEMENTATION

A free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in this article is also available via GitHub.

CONTACT

philippe_boileau@berkeley.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量测序数据的统计分析已经改变了生物科学。尽管取得了无数的进展，但从被技术噪声污染的数据中恢复可解释的生物学信号仍然是一个普遍存在的开放性问题。几类程序，包括经典的降维技术和其他结合主题知识的程序，都提供了有效的进展。然而，目前没有一种程序能够同时满足恢复稳定和相关特征的双重目标。

结果

受最近提出的利用对照数据去除不需要的变化的启发，我们提出了一种主成分分析（PCA）的变体，稀疏对比 PCA，它可以提取稀疏、稳定、可解释和相关的生物学信号。通过模拟研究和对几个公开可用的蛋白质表达、微阵列基因表达和单细胞转录组测序数据集的分析，将新方法与竞争的降维方法进行了比较。

可用性和实现

该方法的免费和开源软件实现，scPCA R 包，可通过 Bioconductor 项目获得。本文中所有分析的代码也可通过 GitHub 获得。

联系信息

philippe_boileau@berkeley.edu。

补充信息

补充数据可在生物信息学在线获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

稀疏对比主成分分析在高维生物学数据中的应用。

Exploring high-dimensional biological data with sparse contrastive principal component analysis.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系信息

补充信息

相似文献

引用本文的文献

稀疏对比主成分分析在高维生物学数据中的应用。

Exploring high-dimensional biological data with sparse contrastive principal component analysis.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系信息

补充信息

相似文献

引用本文的文献