College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China; Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, People's Republic of China.
Math Biosci. 2018 Dec;306:136-144. doi: 10.1016/j.mbs.2018.09.010. Epub 2018 Oct 5.
Drugs can produce intended therapeutic effects to treat different diseases. However, they may also cause side effects at the same time. For an approved drug, it is best to detect all side effects it can produce. Otherwise, it may bring great risks for pharmaceuticals companies as well as be harmful to human body. It is urgent to design quick and reliable identification methods to detect the side effects for a given drug. In this study, a binary classification model was proposed to predict drug side effects. Different from most previous methods, our model termed the pair of drug and side effect as a sample and convert the original problem to a binary classification problem. Based on the similarity idea, each pair was represented by five features, each of which was derived from a type of drug property. The strong machine learning algorithm, random forest, was adopted as the prediction engine. The ten-fold cross-validation on five datasets with different negative samples indicated that the proposed model yielded a good performance of Matthews correlation coefficient around 0.550 and AUC around 0.8492. In addition, we also analyzed the contribution of each drug property for construction of the model. The results indicated that drug similarity in fingerprint was most related to the prediction of drug side effects and all drug properties gave less or more contributions.
药物可以产生预期的治疗效果来治疗不同的疾病。然而,它们同时也可能产生副作用。对于已批准的药物,最好能检测到它可能产生的所有副作用。否则,这可能给制药公司带来巨大的风险,对人体也有害。因此,迫切需要设计快速可靠的识别方法来检测给定药物的副作用。在这项研究中,提出了一种用于预测药物副作用的二分类模型。与大多数先前的方法不同,我们的模型将药物和副作用这对组合作为一个样本,并将原始问题转化为二分类问题。基于相似性的思想,每对样本由五个特征表示,每个特征都来自一种药物属性。采用强大的机器学习算法随机森林作为预测引擎。在五个具有不同负样本的数据集上进行的 10 折交叉验证表明,所提出的模型在马修斯相关系数(Matthews correlation coefficient)方面表现良好,约为 0.550,AUC 约为 0.8492。此外,我们还分析了每个药物属性对模型构建的贡献。结果表明,指纹中的药物相似性与药物副作用的预测最相关,而所有药物属性都有或多或少的贡献。