Akutekwe Arinze, Seker Huseyin, Yang Shengxiang
Department of Computer Science and Digital Technologies, Bio-Health Informatics Research Group, University of Northumbria at Newcastle, Newcastle upon Tyne NE1 8ST, UK.
School of Computer Science and Informatics, Centre for Computational Intelligence, De Montfort University, Leicester LE1 9BH, UK.
IET Syst Biol. 2015 Dec;9(6):294-302. doi: 10.1049/iet-syb.2015.0031.
Accurate and reliable modelling of protein-protein interaction networks for complex diseases such as colorectal cancer can help better understand mechanism of diseases and potentially discover new drugs. Different machine learning methods such as empirical mode decomposition combined with least square support vector machine, and discrete Fourier transform have been widely utilised as a classifier and for automatic discovery of biomarkers for the diagnosis of the disease. The existing methods are, however, less efficient as they tend to ignore interaction with the classifier. In this study, the authors propose a two-stage optimisation approach to effectively select biomarkers and discover interactions among them. At the first stage, particle swarm optimisation (PSO) and differential evolution (DE) are used to optimise parameters of support vector machine recursive feature elimination algorithm, and dynamic Bayesian network is then used to predict temporal relationship between biomarkers across two time points. Results show that 18 and 25 biomarkers selected by PSO and DE-based approach, respectively, yields the same accuracy of 97.3% and F1-score of 97.7 and 97.6%, respectively. The stratified analysis reveals that Alpha-2-HS-glycoprotein was a dominant hub gene with multiple interactions to other genes including Fibrinogen alpha chain, which is also a potential biomarker for colorectal cancer.
对结直肠癌等复杂疾病的蛋白质-蛋白质相互作用网络进行准确可靠的建模,有助于更好地理解疾病机制,并有可能发现新药。不同的机器学习方法,如经验模式分解结合最小二乘支持向量机以及离散傅里叶变换,已被广泛用作分类器,并用于自动发现疾病诊断的生物标志物。然而,现有方法效率较低,因为它们往往忽略了与分类器的相互作用。在本研究中,作者提出了一种两阶段优化方法,以有效地选择生物标志物并发现它们之间的相互作用。在第一阶段,使用粒子群优化(PSO)和差分进化(DE)来优化支持向量机递归特征消除算法的参数,然后使用动态贝叶斯网络预测两个时间点上生物标志物之间的时间关系。结果表明,分别由基于PSO和DE的方法选择的18个和25个生物标志物,准确率均为97.3%,F1分数分别为97.7和97.6%。分层分析显示,α-2-HS-糖蛋白是一个主导的枢纽基因,与包括纤维蛋白原α链在内的其他基因有多种相互作用,纤维蛋白原α链也是结直肠癌的潜在生物标志物。