Gligorijević Vladimir, Pržulj Nataša
Department of Computing, Imperial College London, London SW7 2AZ, UK.
Department of Computing, Imperial College London, London SW7 2AZ, UK
J R Soc Interface. 2015 Nov 6;12(112). doi: 10.1098/rsif.2015.0571.
Rapid technological advances have led to the production of different types of biological data and enabled construction of complex networks with various types of interactions between diverse biological entities. Standard network data analysis methods were shown to be limited in dealing with such heterogeneous networked data and consequently, new methods for integrative data analyses have been proposed. The integrative methods can collectively mine multiple types of biological data and produce more holistic, systems-level biological insights. We survey recent methods for collective mining (integration) of various types of networked biological data. We compare different state-of-the-art methods for data integration and highlight their advantages and disadvantages in addressing important biological problems. We identify the important computational challenges of these methods and provide a general guideline for which methods are suited for specific biological problems, or specific data types. Moreover, we propose that recent non-negative matrix factorization-based approaches may become the integration methodology of choice, as they are well suited and accurate in dealing with heterogeneous data and have many opportunities for further development.
快速的技术进步导致了不同类型生物数据的产生,并使得构建具有不同生物实体之间各种相互作用的复杂网络成为可能。标准的网络数据分析方法在处理此类异构网络数据时显示出局限性,因此,人们提出了用于综合数据分析的新方法。这些综合方法可以共同挖掘多种类型的生物数据,并产生更全面的、系统层面的生物学见解。我们综述了近期用于各种类型网络化生物数据的集体挖掘(整合)方法。我们比较了不同的最新数据整合方法,并突出了它们在解决重要生物学问题方面的优缺点。我们确定了这些方法面临的重要计算挑战,并为哪些方法适用于特定生物学问题或特定数据类型提供了一般指导方针。此外,我们提出,近期基于非负矩阵分解的方法可能会成为首选的整合方法,因为它们非常适合处理异构数据且准确,并且有许多进一步发展的机会。