Karsakov Alexander, Bartlett Thomas, Ryblov Artem, Meyerov Iosif, Ivanchenko Mikhail, Zaikin Alexey
Department of Applied Mathematics and Centre of Bioinformatics, Lobachevsky State University of Nizhny Novgorod, Nizhniy Novgorod, Russia.
Institute for Women's Health and Department of Mathematics, University College London, London, United Kingdom.
PLoS One. 2017 Jan 20;12(1):e0169661. doi: 10.1371/journal.pone.0169661. eCollection 2017.
We make use of ideas from the theory of complex networks to implement a machine learning classification of human DNA methylation data, that carry signatures of cancer development. The data were obtained from patients with various kinds of cancers and represented as parenclictic networks, wherein nodes correspond to genes, and edges are weighted according to pairwise variation from control group subjects. We demonstrate that for the 10 types of cancer under study, it is possible to obtain a high performance of binary classification between cancer-positive and negative samples based on network measures. Remarkably, an accuracy as high as 93-99% is achieved with only 12 network topology indices, in a dramatic reduction of complexity from the original 15295 gene methylation levels. Moreover, it was found that the parenclictic networks are scale-free in cancer-negative subjects, and deviate from the power-law node degree distribution in cancer. The node centrality ranking and arising modular structure could provide insights into the systems biology of cancer.
我们利用复杂网络理论中的思想,对携带癌症发展特征的人类DNA甲基化数据进行机器学习分类。这些数据来自患有各种癌症的患者,并表示为亲环网络,其中节点对应基因,边根据与对照组受试者的成对差异加权。我们证明,对于所研究的10种癌症类型,基于网络度量可以在癌症阳性和阴性样本之间获得高性能的二元分类。值得注意的是,仅使用12个网络拓扑指数就实现了高达93%-99%的准确率,与原始的15295个基因甲基化水平相比,复杂度大幅降低。此外,发现亲环网络在癌症阴性受试者中是无标度的,并且在癌症中偏离幂律节点度分布。节点中心性排名和出现的模块结构可以为癌症系统生物学提供见解。