Suppr超能文献

稀疏偏最小二乘判别分析:用于多类问题的生物学相关特征选择和图形显示。

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.

机构信息

Queensland Facility for Advanced Bioinformatics, University of Queensland, 4072 St Lucia, QLD, Australia.

出版信息

BMC Bioinformatics. 2011 Jun 22;12:253. doi: 10.1186/1471-2105-12-253.

Abstract

BACKGROUND

Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits.

RESULTS

A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework.

CONCLUSIONS

sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.

摘要

背景

在高通量生物数据(如基因表达或单核苷酸多态性 (SNP))上进行变量选择变得不可避免,以便选择相关信息,从而更好地描述疾病或评估遗传结构。在大型数据集上进行变量选择有不同的方法。统计检验常用于识别解释目的的差异表达特征,而机器学习包装器方法可用于预测目的。在多个高度相关变量的情况下,另一种选择是使用多元探索方法更深入地了解细胞生物学、生物途径或复杂特征。

结果

提出了一种简单的稀疏 PLS 探索性方法的扩展,以在多类分类框架中进行变量选择。

结论

sPLS-DA 在公共微阵列和 SNP 数据集上的分类性能与其他包装器或稀疏判别分析方法相似。更重要的是,sPLS-DA 在计算效率方面具有明显的竞争力,并且通过有价值的图形输出,在结果的可解释性方面具有优势。sPLS-DA 可在 R 包 mixOmics 中使用,该包专门用于分析大型生物数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b1/3133555/77166101138e/1471-2105-12-253-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验