School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32603, United States.
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad752.
There is mounting evidence that the subcellular localization of lncRNAs can provide valuable insights into their biological functions. In the real world of transcriptomes, lncRNAs are usually localized in multiple subcellular localizations. Furthermore, lncRNAs have specific localization patterns for different subcellular localizations. Although several computational methods have been developed to predict the subcellular localization of lncRNAs, few of them are designed for lncRNAs that have multiple subcellular localizations, and none of them take motif specificity into consideration.
In this study, we proposed a novel deep learning model, called LncLocFormer, which uses only lncRNA sequences to predict multi-label lncRNA subcellular localization. LncLocFormer utilizes eight Transformer blocks to model long-range dependencies within the lncRNA sequence and shares information across the lncRNA sequence. To exploit the relationship between different subcellular localizations and find distinct localization patterns for different subcellular localizations, LncLocFormer employs a localization-specific attention mechanism. The results demonstrate that LncLocFormer outperforms existing state-of-the-art predictors on the hold-out test set. Furthermore, we conducted a motif analysis and found LncLocFormer can capture known motifs. Ablation studies confirmed the contribution of the localization-specific attention mechanism in improving the prediction performance.
The LncLocFormer web server is available at http://csuligroup.com:9000/LncLocFormer. The source code can be obtained from https://github.com/CSUBioGroup/LncLocFormer.
越来越多的证据表明,lncRNAs 的亚细胞定位可以为其生物学功能提供有价值的见解。在转录组的真实世界中,lncRNAs 通常定位于多个亚细胞定位。此外,lncRNAs 对不同的亚细胞定位具有特定的定位模式。尽管已经开发了几种计算方法来预测 lncRNA 的亚细胞定位,但很少有方法是专门为具有多个亚细胞定位的 lncRNA 设计的,也没有考虑到基序特异性。
在这项研究中,我们提出了一种名为 LncLocFormer 的新型深度学习模型,该模型仅使用 lncRNA 序列来预测多标签 lncRNA 亚细胞定位。LncLocFormer 利用八个 Transformer 块来对 lncRNA 序列中的长程依赖性进行建模,并在 lncRNA 序列之间共享信息。为了利用不同亚细胞定位之间的关系并为不同亚细胞定位找到独特的定位模式,LncLocFormer 采用了特定于定位的注意力机制。结果表明,LncLocFormer 在预留测试集上优于现有的最先进的预测器。此外,我们进行了基序分析,发现 LncLocFormer 可以捕获已知的基序。消融研究证实了特定于定位的注意力机制在提高预测性能方面的贡献。
LncLocFormer 的网络服务器可在 http://csuligroup.com:9000/LncLocFormer 访问。源代码可从 https://github.com/CSUBioGroup/LncLocFormer 获得。