Suppr超能文献

利用潜在语义索引分析大规模蛋白质组学项目。

Analyzing large-scale proteomics projects with latent semantic indexing.

作者信息

Klie Sebastian, Martens Lennart, Vizcaíno Juan Antonio, Côté Richard, Jones Phil, Apweiler Rolf, Hinneburg Alexander, Hermjakob Henning

机构信息

Martin Luther University Halle-Wittenberg, Halle-Saale, Germany.

出版信息

J Proteome Res. 2008 Jan;7(1):182-91. doi: 10.1021/pr070461k. Epub 2007 Nov 30.

Abstract

Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput experiments have been accumulating steadily. Several large-scale projects in particular have contributed substantially to the amount of identifications available to the community. Despite the considerable body of information amassed, very few successful analyses have been performed and published on this data, leveling off the ultimate value of these projects far below their potential. A prominent reason published proteomics data is seldom reanalyzed lies in the heterogeneous nature of the original sample collection and the subsequent data recording and processing. To illustrate that at least part of this heterogeneity can be compensated for, we here apply a latent semantic analysis to the data contributed by the Human Proteome Organization's Plasma Proteome Project (HUPO PPP). Interestingly, despite the broad spectrum of instruments and methodologies applied in the HUPO PPP, our analysis reveals several obvious patterns that can be used to formulate concrete recommendations for optimizing proteomics project planning as well as the choice of technologies used in future experiments. It is clear from these results that the analysis of large bodies of publicly available proteomics data by noise-tolerant algorithms such as the latent semantic analysis holds great promise and is currently underexploited.

摘要

自从蛋白质组学数据的公共数据存储库出现以来,高通量实验中易于获取的结果一直在稳步积累。特别是几个大规模项目为科学界可获得的鉴定数量做出了巨大贡献。尽管积累了大量信息,但对这些数据进行的成功分析却很少,使得这些项目的最终价值远低于其潜力。已发表的蛋白质组学数据很少被重新分析的一个突出原因在于原始样本收集以及后续数据记录和处理的异质性。为了说明这种异质性至少有一部分是可以弥补的,我们在此对人类蛋白质组组织血浆蛋白质组计划(HUPO PPP)提供的数据应用潜在语义分析。有趣的是,尽管HUPO PPP应用了广泛的仪器和方法,但我们的分析揭示了几个明显的模式,可用于为优化蛋白质组学项目规划以及未来实验中使用的技术选择制定具体建议。从这些结果可以清楚地看出,通过潜在语义分析等容错算法对大量公开可用的蛋白质组学数据进行分析具有很大的前景,目前尚未得到充分利用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验