Suppr超能文献

将生物信息纳入稀疏主成分分析并应用于基因组数据。

Incorporating biological information in sparse principal component analysis with application to genomic data.

作者信息

Li Ziyi, Safo Sandra E, Long Qi

机构信息

Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road, Atlanta, 30322, GA, USA.

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, 19104, PA, USA.

出版信息

BMC Bioinformatics. 2017 Jul 11;18(1):332. doi: 10.1186/s12859-017-1740-7.

Abstract

BACKGROUND

Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection.

RESULTS

Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma.

CONCLUSIONS

The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.

摘要

背景

稀疏主成分分析(PCA)是一种用于高维数据降维、模式识别和可视化的常用工具。人们已经认识到,复杂的生物学机制是通过多个基因在通常由图表示的网络中协同作用的关系发生的。最近的研究表明,在回归分析中纳入此类生物学信息可提高特征选择和预测性能,但将这种方法扩展到PCA的研究还很有限。在本文中,我们提出了两种新的稀疏PCA方法,即融合稀疏PCA和分组稀疏PCA,它们能够在变量选择中纳入先验生物学信息。

结果

我们的模拟研究表明,与现有的稀疏PCA方法相比,当图结构正确指定时,所提出的方法具有更高的灵敏度和特异性,并且对错误指定的图结构具有相当的鲁棒性。应用于胶质母细胞瘤基因表达数据集,识别出了文献中提示与胶质母细胞瘤相关的通路。

结论

所提出的稀疏PCA方法,即融合稀疏PCA和分组稀疏PCA,能够在变量选择中有效地纳入先验生物学信息,从而改善特征选择,使主成分载荷更易于解释,并有可能为复杂疾病的分子基础提供见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a72/5504598/9a3e3bca4032/12859_2017_1740_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验