Suppr超能文献

使用 DISCO-SCA 和适当的 GSVD 作为摆动方法来寻找共同和独特的过程。

DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes.

机构信息

Department of Psychology, Katholieke Universiteit Leuven, Leuven, Belgium.

出版信息

PLoS One. 2012;7(5):e37840. doi: 10.1371/journal.pone.0037840. Epub 2012 May 31.

Abstract

BACKGROUND

In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA).

RESULTS

Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question.

CONCLUSIONS

Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided.

摘要

背景

在系统生物学中,通常会从多个来源为同一组生物实体获取信息。例如,筛选不同生物体中同一组同源基因的表达数据,以及使用不同高通量技术获得的同一组培养样本的数据。一个主要的挑战是找到数据背后的重要生物学过程,并从中分离出所有数据源共有的过程和特定于特定源的过程。最近,提出了两种有前途的同时数据集成方法来实现这一目标,即广义奇异值分解(GSVD)和同时成分分析与旋转到共同和独特成分(DISCO-SCA)。

结果

理论分析和对生物学相关数据的应用表明:(1)GSVD 的直接应用产生的结果不尽如人意;(2)DISCO-SCA 性能良好;(3)提供适当的预处理和算法适应,GSVD 达到与 DISCO-SCA 相似的性能水平;(4)DISCO-SCA 可直接推广到两个以上数据源。通过两个应用说明了 DISCO-SCA 的生物学相关性。首先,在比较基因组学的背景下,它显示 DISCO-SCA 恢复了细胞周期进程的共同主题和酵母对信息素的特异性反应。通过以适当的方式应用基因集富集分析获得了生物学注释。其次,在将 DISCO-SCA 应用于用两种不同化学分析平台获得的大肠杆菌代谢组学数据的应用中,说明了数据背后的一些生物学过程所涉及的代谢物仅由两种平台中的一种检测到;因此,微生物代谢组学的平台应根据生物学问题进行定制。

结论

DISCO-SCA 和适当应用的 GSVD 都是在多源数据中寻找共同和独特过程的有前途的综合方法。为这两种方法提供了开源代码。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验