Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan.
Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia.
BMC Bioinformatics. 2021 Feb 10;22(1):63. doi: 10.1186/s12859-021-03993-0.
Human dicer is an enzyme that cleaves pre-miRNAs into miRNAs. Several models have been developed to predict human dicer cleavage sites, including PHDCleav and LBSizeCleav. Given an input sequence, these models can predict whether the sequence contains a cleavage site. However, these models only consider each sequence independently and lack interpretability. Therefore, it is necessary to develop an accurate and explainable predictor, which employs relations between different sequences, to enhance the understanding of the mechanism by which human dicer cleaves pre-miRNA.
In this study, we develop an accurate and explainable predictor for human dicer cleavage site - ReCGBM. We design relational features and class features as inputs to a lightGBM model. Computational experiments show that ReCGBM achieves the best performance compared to the existing methods. Further, we find that features in close proximity to the center of pre-miRNA are more important and make a significant contribution to the performance improvement of the developed method.
The results of this study show that ReCGBM is an interpretable and accurate predictor. Besides, the analyses of feature importance show that it might be of particular interest to consider more informative features close to the center of the pre-miRNA in future predictors.
人 Dicer 酶能够将前体 miRNA 切割成 miRNA。已经开发了几种模型来预测人 Dicer 的切割位点,包括 PHDCleav 和 LBSizeCleav。这些模型可以根据输入序列预测序列中是否含有切割位点。然而,这些模型仅独立考虑每个序列,缺乏可解释性。因此,有必要开发一种准确且可解释的预测器,利用不同序列之间的关系,从而加深对人 Dicer 切割前体 miRNA 机制的理解。
本研究开发了一种用于人 Dicer 切割位点的准确且可解释的预测器 ReCGBM。我们将关系特征和类别特征作为输入设计到 LightGBM 模型中。计算实验表明,与现有方法相比,ReCGBM 具有最佳性能。此外,我们发现与前体 miRNA 中心距离较近的特征更为重要,对提高所开发方法的性能有显著贡献。
本研究结果表明,ReCGBM 是一种可解释且准确的预测器。此外,特征重要性分析表明,在未来的预测器中,考虑前体 miRNA 中心附近更具信息量的特征可能会特别有趣。