Department of Computer Science & Technology, Tongji University, Shanghai 201804, China.
School of Electrical & Information Engineering, Anhui University of Technology, Ma'anshan 243002, China.
Int J Mol Sci. 2020 Aug 8;21(16):5694. doi: 10.3390/ijms21165694.
The task of drug-target interaction (DTI) prediction plays important roles in drug development. The experimental methods in DTIs are time-consuming, expensive and challenging. To solve these problems, machine learning-based methods are introduced, which are restricted by effective feature extraction and negative sampling. In this work, features with electrotopological state (E-state) fingerprints for drugs and amphiphilic pseudo amino acid composition (APAAC) for target proteins are tested. E-state fingerprints are extracted based on both molecular electronic and topological features with the same metric. APAAC is an extension of amino acid composition (AAC), which is calculated based on hydrophilic and hydrophobic characters to construct sequence order information. Using the combination of these feature pairs, the prediction model is established by support vector machines. In order to enhance the effectiveness of features, a distance-based negative sampling is proposed to obtain reliable negative samples. It is shown that the prediction results of area under curve for Receiver Operating Characteristic (AUC) are above 98.5% for all the three datasets in this work. The comparison of state-of-the-art methods demonstrates the effectiveness and efficiency of proposed method, which will be helpful for further drug development.
药物-靶点相互作用(DTI)预测在药物开发中起着重要作用。DTIs 的实验方法耗时、昂贵且具有挑战性。为了解决这些问题,引入了基于机器学习的方法,但这些方法受到有效特征提取和负采样的限制。在这项工作中,我们测试了药物的电拓扑状态(E-state)指纹和靶蛋白的两亲伪氨基酸组成(APAAC)特征。E-state 指纹是基于分子电子和拓扑特征,使用相同的度量标准提取的。APAAC 是氨基酸组成(AAC)的扩展,它是根据亲水性和疏水性特征计算的,用于构建序列顺序信息。通过使用这些特征对的组合,我们使用支持向量机建立了预测模型。为了增强特征的有效性,我们提出了一种基于距离的负采样方法来获取可靠的负样本。结果表明,在这项工作中的所有三个数据集上,曲线下面积(AUC)的预测结果均高于 98.5%。与现有方法的比较证明了所提出方法的有效性和效率,这将有助于进一步的药物开发。