Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China.
State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan 430072, China.
Bioinformatics. 2017 Sep 1;33(17):2706-2714. doi: 10.1093/bioinformatics/btx176.
Integrating different omics profiles is a challenging task, which provides a comprehensive way to understand complex diseases in a multi-view manner. One key for such an integration is to extract intrinsic patterns in concordance with data structures, so as to discover consistent information across various data types even with noise pollution. Thus, we proposed a novel framework called 'pattern fusion analysis' (PFA), which performs automated information alignment and bias correction, to fuse local sample-patterns (e.g. from each data type) into a global sample-pattern corresponding to phenotypes (e.g. across most data types). In particular, PFA can identify significant sample-patterns from different omics profiles by optimally adjusting the effects of each data type to the patterns, thereby alleviating the problems to process different platforms and different reliability levels of heterogeneous data.
To validate the effectiveness of our method, we first tested PFA on various synthetic datasets, and found that PFA can not only capture the intrinsic sample clustering structures from the multi-omics data in contrast to the state-of-the-art methods, such as iClusterPlus, SNF and moCluster, but also provide an automatic weight-scheme to measure the corresponding contributions by data types or even samples. In addition, the computational results show that PFA can reveal shared and complementary sample-patterns across data types with distinct signal-to-noise ratios in Cancer Cell Line Encyclopedia (CCLE) datasets, and outperforms over other works at identifying clinically distinct cancer subtypes in The Cancer Genome Atlas (TCGA) datasets.
PFA has been implemented as a Matlab package, which is available at http://www.sysbio.ac.cn/cb/chenlab/images/PFApackage_0.1.rar .
lnchen@sibs.ac.cn , liujuan@whu.edu.cn or zengtao@sibs.ac.cn.
Supplementary data are available at Bioinformatics online.
整合不同的组学谱是一项具有挑战性的任务,它提供了一种全面的方法,可以从多视图的角度理解复杂疾病。这种整合的一个关键是提取与数据结构一致的内在模式,以便即使在存在噪声污染的情况下,也能在各种数据类型中发现一致的信息。因此,我们提出了一种称为“模式融合分析”(PFA)的新框架,该框架执行自动信息对齐和偏差校正,将局部样本模式(例如,来自每种数据类型)融合到对应于表型的全局样本模式中(例如,跨越大多数数据类型)。特别是,PFA 可以通过优化每种数据类型对模式的影响来识别来自不同组学谱的显著样本模式,从而缓解处理不同平台和异构数据不同可靠性水平的问题。
为了验证我们方法的有效性,我们首先在各种合成数据集上测试了 PFA,发现 PFA 不仅可以捕捉多组学数据中的内在样本聚类结构,与 iClusterPlus、SNF 和 moCluster 等最新方法相比,还可以提供一种自动权重方案来衡量数据类型甚至样本的相应贡献。此外,计算结果表明,PFA 可以揭示癌症细胞系百科全书(CCLE)数据集中具有不同信噪比的不同数据类型之间的共享和互补样本模式,并在识别癌症基因组图谱(TCGA)数据集中具有临床明显差异的癌症亚型方面优于其他作品。
PFA 已实现为一个 Matlab 包,可在 http://www.sysbio.ac.cn/cb/chenlab/images/PFApackage_0.1.rar 获得。
lnchen@sibs.ac.cn,liujuan@whu.edu.cn 或 zengtao@sibs.ac.cn。
补充数据可在生物信息学在线获得。