Park Youngjun, Heider Dominik, Hauschild Anne-Christin
Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany.
Department of Medical Informatics, University Medical Center Göttingen, 37075 Göttingen, Germany.
Cancers (Basel). 2021 Jun 24;13(13):3148. doi: 10.3390/cancers13133148.
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
下一代测序(NGS)技术的迅速发展及其在癌症研究大规模队列中的应用带来了大数据的常见挑战。它开辟了一个融合系统生物学和机器学习的新研究领域。随着大规模NGS数据的积累,复杂的数据分析方法变得不可或缺。此外,NGS数据已与系统生物学相结合,以建立更好的预测模型来确定肿瘤和肿瘤亚型的特征。因此,引入了各种机器学习算法来识别潜在的生物学机制。在这项工作中,我们回顾了为NGS数据分析开发的新技术,并描述了这些计算方法如何整合系统生物学和组学数据。随后,我们讨论深度神经网络如何优于其他方法、图神经网络(GNN)在系统生物学中的潜力以及NGS生物医学研究中的局限性。为了思考各种挑战和相应的计算解决方案,我们将讨论以下三个主题:(i)分子特征,(ii)肿瘤异质性,以及(iii)药物发现。我们得出结论,机器学习和基于网络的方法可以提供有价值的见解并建立高度准确的模型。然而,明智地选择学习算法和生物网络信息对于每个特定研究问题的成功至关重要。