School of Computer Science and Engineering, Central South University, 410075, Changsha, China.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac244.
The interplay between protein and nucleic acid participates in diverse biological activities. Accurately identifying the interaction between protein and nucleic acid can strengthen the understanding of protein function. However, conventional methods are too time-consuming, and computational methods are type-agnostic predictions. We proposed an ensemble predictor termed TSNAPred and first used it to identify residues that bind to A-DNA, B-DNA, ssDNA, mRNA, tRNA and rRNA. TSNAPred combines LightGBM and capsule network, both learned on the feature derived from protein sequence. TSNAPred utilizes the sliding window technique to extract long-distance dependencies between residues and a weighted ensemble strategy to enhance the prediction performance. The results show that TSNAPred can effectively identify type-specific nucleic acid binding residues in our test set. What is more, it also can discriminate DNA-binding and RNA-binding residues, which has improved 5% to 10% on the AUC value compared with other state-of-the-art methods. The dataset and code of TSNAPred are available at: https://github.com/niewenjuan-csu/TSNAPred.
蛋白质与核酸的相互作用参与了多种生物活性。准确识别蛋白质与核酸之间的相互作用可以加强对蛋白质功能的理解。然而,传统方法过于耗时,而计算方法则是无类型预测。我们提出了一种名为 TSNAPred 的集成预测器,并首次将其用于识别与 A-DNA、B-DNA、ssDNA、mRNA、tRNA 和 rRNA 结合的残基。TSNAPred 结合了 LightGBM 和胶囊网络,这两种方法都是基于从蛋白质序列中提取的特征进行学习的。TSNAPred 利用滑动窗口技术提取残基之间的长距离依赖性,并采用加权集成策略来提高预测性能。结果表明,TSNAPred 可以有效地识别我们测试集中特定类型的核酸结合残基。更重要的是,它还可以区分 DNA 结合和 RNA 结合残基,与其他最先进的方法相比,AUC 值提高了 5%至 10%。TSNAPred 的数据集和代码可在:https://github.com/niewenjuan-csu/TSNAPred 上获得。