Li Juzeng, Wang Yi
Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, China.
Human Phenome Institute, Fudan University, Shanghai, China.
Front Genet. 2024 Jan 8;14:1290447. doi: 10.3389/fgene.2023.1290447. eCollection 2023.
Linear dimensionality reduction techniques are widely used in many applications. The goal of dimensionality reduction is to eliminate the noise of data and extract the main features of data. Several dimension reduction methods have been developed, such as linear-based principal component analysis (PCA), nonlinear-based t-distributed stochastic neighbor embedding (t-SNE), and deep-learning-based autoencoder (AE). However, PCA only determines the projection direction with the highest variance, t-SNE is sometimes only suitable for visualization, and AE and nonlinear methods discard the linear projection. To retain the linear projection of raw data and generate a better result of dimension reduction either for visualization or downstream analysis, we present neural principal component analysis (nPCA), an unsupervised deep learning approach capable of retaining richer information of raw data as a promising improvement to PCA. To evaluate the performance of the nPCA algorithm, we compare the performance of 10 public datasets and 6 single-cell RNA sequencing (scRNA-seq) datasets of the pancreas, benchmarking our method with other classic linear dimensionality reduction methods. We concluded that the nPCA method is a competitive alternative method for dimensionality reduction tasks.
线性降维技术在许多应用中被广泛使用。降维的目标是消除数据噪声并提取数据的主要特征。已经开发了几种降维方法,例如基于线性的主成分分析(PCA)、基于非线性的t分布随机邻域嵌入(t-SNE)以及基于深度学习的自动编码器(AE)。然而,PCA仅确定具有最高方差的投影方向,t-SNE有时仅适用于可视化,并且AE和非线性方法会丢弃线性投影。为了保留原始数据的线性投影并为可视化或下游分析生成更好的降维结果,我们提出了神经主成分分析(nPCA),这是一种无监督深度学习方法,能够保留原始数据更丰富的信息,作为对PCA的一种有前景的改进。为了评估nPCA算法的性能,我们比较了10个公共数据集和6个胰腺单细胞RNA测序(scRNA-seq)数据集的性能,将我们的方法与其他经典线性降维方法进行基准测试。我们得出结论,nPCA方法是降维任务的一种有竞争力的替代方法。