School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China; School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi 62000-00200, Kenya.
School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China.
Genomics. 2020 Sep;112(5):2928-2936. doi: 10.1016/j.ygeno.2020.05.005. Epub 2020 May 11.
Long non-coding RNAs (lncRNAs) play key roles in regulating cellular biological processes through diverse molecular mechanisms including binding to RNA binding proteins. The majority of plant lncRNAs are functionally uncharacterized, thus, accurate prediction of plant lncRNA-protein interaction is imperative for subsequent functional studies. We present an integrative model, namely DRPLPI. Its uniqueness is that it predicts by multi-feature fusion. Structural and four groups of sequence features are used, including tri-nucleotide composition, gapped k-mer, recursive complement and binary profile. We design a multi-head self-attention long short-term memory encoder-decoder network to extract generative high-level features. To obtain robust results, DRPLPI combines categorical boosting and extra trees into a single meta-learner. Experiments on Zea mays and Arabidopsis thaliana obtained 0.9820 and 0.9652 area under precision/recall curve (AUPRC) respectively. The proposed method shows significant enhancement in the prediction performance compared with existing state-of-the-art methods.
长非编码 RNA(lncRNA)通过多种分子机制(包括与 RNA 结合蛋白结合)在调节细胞生物学过程中发挥关键作用。大多数植物 lncRNA 的功能尚未被阐明,因此,准确预测植物 lncRNA-蛋白相互作用对于后续的功能研究至关重要。我们提出了一种综合模型,即 DRPLPI。它的独特之处在于它通过多特征融合进行预测。使用了结构和四组序列特征,包括三核苷酸组成、缺口 k-mer、递归补码和二进制轮廓。我们设计了一个多头自注意力长短时记忆编码器-解码器网络来提取生成性的高级特征。为了获得稳健的结果,DRPLPI 将分类提升和 ExtraTrees 组合到单个元学习者中。在玉米和拟南芥上的实验分别获得了 0.9820 和 0.9652 的精度/召回曲线下面积(AUPRC)。与现有最先进的方法相比,所提出的方法在预测性能方面有显著提高。