Suppr超能文献

EDCLoc:一种用于mRNA亚细胞定位的预测模型,使用改进的焦点损失来解决多标签类不平衡问题。

EDCLoc: a prediction model for mRNA subcellular localization using improved focal loss to address multi-label class imbalance.

作者信息

Deng Yu, Jia Jianhua, Yi Mengyue

机构信息

School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.

出版信息

BMC Genomics. 2024 Dec 27;25(1):1252. doi: 10.1186/s12864-024-11173-6.

Abstract

BACKGROUND

The subcellular localization of mRNA plays a crucial role in gene expression regulation and various cellular processes. However, existing wet lab techniques like RNA-FISH are usually time-consuming, labor-intensive, and limited to specific tissue types. Researchers have developed several computational methods to predict mRNA subcellular localization to address this. These methods face the problem of class imbalance in multi-label classification, causing models to favor majority classes and overlook minority classes during training. Additionally, traditional feature extraction methods have high computational costs, incomplete features, and may lead to the loss of critical information. On the other hand, deep learning methods face challenges related to hardware performance and training time when handling complex sequences. They may suffer from the curse of dimensionality and overfitting problems. Therefore, there is an urgent need for more efficient and accurate prediction models.

RESULTS

To address these issues, we propose a multi-label classifier, EDCLoc, for predicting mRNA subcellular localization. EDCLoc reduces training pressure through a stepwise pooling strategy and applies grouped convolution blocks of varying sizes at different levels, combined with residual connections, to achieve efficient feature extraction and gradient propagation. The model employs global max pooling at the end to further reduce feature dimensions and highlight key features. To tackle class imbalance, we improved the focal loss function to enhance the model's focus on minority classes. Evaluation results show that EDCLoc outperforms existing methods in most subcellular regions. Additionally, the position weight matrix extracted by multi-scale CNN filters can match known RNA-binding protein motifs, demonstrating EDCLoc's effectiveness in capturing key sequence features.

CONCLUSIONS

EDCLoc outperforms existing prediction tools in most subcellular regions and effectively mitigates class imbalance issues in multi-label classification. These advantages make EDCLoc a reliable choice for multi-label mRNA subcellular localization. The dataset and source code used in this study are available at https://github.com/DellCode233/EDCLoc .

摘要

背景

mRNA的亚细胞定位在基因表达调控和各种细胞过程中起着至关重要的作用。然而,现有的如RNA-FISH等湿实验室技术通常耗时、费力,并且仅限于特定的组织类型。研究人员已经开发了几种计算方法来预测mRNA亚细胞定位以解决此问题。这些方法在多标签分类中面临类别不平衡问题,导致模型在训练期间偏向多数类而忽略少数类。此外,传统的特征提取方法计算成本高、特征不完整,并且可能导致关键信息的丢失。另一方面,深度学习方法在处理复杂序列时面临与硬件性能和训练时间相关的挑战。它们可能会受到维度诅咒和过拟合问题的困扰。因此,迫切需要更高效、准确的预测模型。

结果

为了解决这些问题,我们提出了一种用于预测mRNA亚细胞定位的多标签分类器EDCLoc。EDCLoc通过逐步池化策略降低训练压力,并在不同级别应用不同大小的分组卷积块,结合残差连接,以实现高效的特征提取和梯度传播。该模型在最后采用全局最大池化进一步降低特征维度并突出关键特征。为了解决类别不平衡问题,我们改进了焦点损失函数以增强模型对少数类别的关注。评估结果表明,EDCLoc在大多数亚细胞区域优于现有方法。此外,通过多尺度CNN滤波器提取的位置权重矩阵可以与已知的RNA结合蛋白基序匹配,证明了EDCLoc在捕获关键序列特征方面的有效性。

结论

EDCLoc在大多数亚细胞区域优于现有的预测工具,并有效缓解了多标签分类中的类别不平衡问题。这些优势使EDCLoc成为多标签mRNA亚细胞定位的可靠选择。本研究中使用的数据集和源代码可在https://github.com/DellCode233/EDCLoc上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680f/11674359/b559e143af9b/12864_2024_11173_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验