Suppr超能文献

稀疏蛋白质组学分析——一种基于压缩感知的高维蛋白质组学质谱数据特征选择和分类方法。

Sparse Proteomics Analysis - a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data.

作者信息

Conrad Tim O F, Genzel Martin, Cvetkovic Nada, Wulkow Niklas, Leichtle Alexander, Vybiral Jan, Kutyniok Gitta, Schütte Christof

机构信息

Department of Mathematics, Freie Universität Berlin, Arnimallee 6, Berlin, Germany.

Zuse Institute Berlin, Takustr. 7, Berlin, Germany.

出版信息

BMC Bioinformatics. 2017 Mar 9;18(1):160. doi: 10.1186/s12859-017-1565-4.

Abstract

BACKGROUND

High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible.

RESULTS

We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets.

摘要

背景

高通量蛋白质组学技术,如基于质谱(MS)的方法,会产生非常高维的数据集。在临床环境中,人们通常感兴趣的是不同类别患者的质谱如何不同,例如健康患者的光谱与患有特定疾病患者的光谱之间的差异。需要机器学习算法来(a)识别这些区分特征,以及(b)基于此特征集对未知光谱进行分类。由于获取的数据通常有噪声,算法应能抵御噪声和异常值,同时识别出的特征集应尽可能小。

结果

我们基于压缩感知理论提出了一种新算法,即稀疏蛋白质组学分析(SPA),它使我们能够从质谱数据集中识别出一组最小的区分特征。我们展示了(1)我们的方法在人工和真实世界数据集上的表现,(2)其性能与用于分析蛋白质组学数据的标准(且广泛使用)算法具有竞争力,以及(3)它对随机和系统噪声具有鲁棒性。我们进一步证明了我们的算法对两个先前发表过的临床数据集的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a00b/5343371/d57313b22cc3/12859_2017_1565_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验