Khan Saad M, He Fei, Wang Duolin, Chen Yongbing, Xu Dong
Informatics Institute, University of Missouri, Columbia, MO 65211, United States.
Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, United States.
Comput Struct Biotechnol J. 2020 Jul 15;18:1877-1883. doi: 10.1016/j.csbj.2020.07.010. eCollection 2020.
Pseudouridine synthase binds to uridine sites and catalyzes the conversion of uridine to pseudouridine (Ψ). This binding takes place in a specific context and in the conformation of nucleotides. Most machine-learning methods for Ψ site classification use nucleotide frequency as a feature, which may not fully depict the relevant conformation around a Ψ site. Using the power of deep learning and raw sequence, as well as secondary structure features, our tool MU-PseUDeep is designed to capture both the sequence and secondary structure context, which inputs the raw RNA sequence and the predicted secondary structure to two sets of convolutional neural networks. It has shown considerable improvement in Ψ site prediction over existing tools, XG-PseU, PseUI, and iRNA-PseU for both balanced and imbalanced datasets. To the best of our knowledge, this is the most accurate tool for Ψ site prediction. We also used MU-PseUDeep to scan the human transcriptome, which shows that the genes with predicted Ψ sites are enriched in nucleotide and protein binding, as well as in neurodegeneration pathways. The tool is open source, available at https://github.com/smk5g5/MU-PseUDeep.
假尿苷合酶与尿苷位点结合,并催化尿苷转化为假尿苷(Ψ)。这种结合发生在特定的环境和核苷酸构象中。大多数用于Ψ位点分类的机器学习方法使用核苷酸频率作为特征,这可能无法完全描绘Ψ位点周围的相关构象。利用深度学习的能力以及原始序列和二级结构特征,我们的工具MU-PseUDeep旨在捕捉序列和二级结构环境,它将原始RNA序列和预测的二级结构输入到两组卷积神经网络中。对于平衡和不平衡数据集,它在Ψ位点预测方面比现有工具XG-PseU、PseUI和iRNA-PseU有了显著改进。据我们所知,这是用于Ψ位点预测的最准确工具。我们还使用MU-PseUDeep扫描了人类转录组,结果表明预测有Ψ位点的基因在核苷酸和蛋白质结合以及神经退行性疾病途径中富集。该工具是开源的,可在https://github.com/smk5g5/MU-PseUDeep获取。