Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes, France.
Molecular Oncology, PSL Research University, CNRS, UMR, Institut Curie, Paris, France.
PLoS Comput Biol. 2024 Sep 12;20(9):e1012446. doi: 10.1371/journal.pcbi.1012446. eCollection 2024 Sep.
The involvement of non-coding RNAs in biological processes and diseases has made the exploration of their functions crucial. Most non-coding RNAs have yet to be studied, creating the need for methods that can rapidly classify large sets of non-coding RNAs into functional groups, or classes. In recent years, the success of deep learning in various domains led to its application to non-coding RNA classification. Multiple novel architectures have been developed, but these advancements are not covered by current literature reviews. We present an exhaustive comparison of the different methods proposed in the state-of-the-art and describe their associated datasets. Moreover, the literature lacks objective benchmarks. We perform experiments to fairly evaluate the performance of various tools for non-coding RNA classification on popular datasets. The robustness of methods to non-functional sequences and sequence boundary noise is explored. We also measure computation time and CO2 emissions. With regard to these results, we assess the relevance of the different architectural choices and provide recommendations to consider in future methods.
非编码 RNA 参与生物过程和疾病,因此探索其功能至关重要。大多数非编码 RNA 尚未被研究,这就需要开发能够快速将大量非编码 RNA 分类为功能组或类别的方法。近年来,深度学习在各个领域的成功促使其应用于非编码 RNA 分类。已经开发出多种新型架构,但这些进展并未被当前文献综述所涵盖。我们对最新技术中提出的不同方法进行了详尽的比较,并描述了它们相关的数据集。此外,文献中缺乏客观的基准。我们进行实验,以便在流行数据集上公平评估各种非编码 RNA 分类工具的性能。我们还探索了方法对非功能序列和序列边界噪声的鲁棒性。我们还测量了计算时间和二氧化碳排放量。根据这些结果,我们评估了不同架构选择的相关性,并为未来的方法提供了一些建议。