Suppr超能文献

蛋白质相互作用位点预测的不平衡数据处理策略。

Imbalance Data Processing Strategy for Protein Interaction Sites Prediction.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):985-994. doi: 10.1109/TCBB.2019.2953908. Epub 2021 Jun 3.

Abstract

Protein-protein interactions play essential roles in various biological progresses. Identifying protein interaction sites can facilitate researchers to understand life activities and therefore will be helpful for drug design. However, the number of experimental determined protein interaction sites is far less than that of protein sites in protein-protein interaction or protein complexes. Therefore, the negative and positive samples are usually imbalanced, which is common but bring result bias on the prediction of protein interaction sites by computational approaches. In this work, we presented three imbalance data processing strategies to reconstruct the original dataset, and then extracted protein features from the evolutionary conservation of amino acids to build a predictor for identification of protein interaction sites. On a dataset with 10,430 surface residues but only 2,299 interface residues, the imbalance dataset processing strategies can obviously reduce the prediction bias, and therefore improve the prediction performance of protein interaction sites. The experimental results show that our prediction models can achieve a better prediction performance, such as a prediction accuracy of 0.758, or a high F-measure of 0.737, which demonstrated the effectiveness of our method.

摘要

蛋白质-蛋白质相互作用在各种生物进程中起着至关重要的作用。鉴定蛋白质相互作用位点可以帮助研究人员了解生命活动,因此有助于药物设计。然而,实验确定的蛋白质相互作用位点的数量远远少于蛋白质-蛋白质相互作用或蛋白质复合物中的蛋白质位点数量。因此,阴性和阳性样本通常是不平衡的,这在计算方法预测蛋白质相互作用位点时很常见,但会带来结果偏差。在这项工作中,我们提出了三种不平衡数据处理策略来重建原始数据集,然后从氨基酸的进化保守性中提取蛋白质特征,以构建用于识别蛋白质相互作用位点的预测器。在一个包含 10430 个表面残基但只有 2299 个界面残基的数据集上,不平衡数据集处理策略可以明显减少预测偏差,从而提高蛋白质相互作用位点的预测性能。实验结果表明,我们的预测模型可以实现更好的预测性能,例如预测准确率为 0.758,或高 F-measure 为 0.737,这证明了我们方法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验