• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

IDDLncLoc:基于不平衡数据分布框架的 lncRNAs 亚细胞定位。

IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions.

机构信息

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.

School of Artificial Intelligence, Jilin University, Changchun, China.

出版信息

Interdiscip Sci. 2022 Jun;14(2):409-420. doi: 10.1007/s12539-021-00497-6. Epub 2022 Feb 22.

DOI:10.1007/s12539-021-00497-6
PMID:35192174
Abstract

Long non-coding RNAs play a crucial role in many life processes of cell, such as genetic markers, RNA splicing, signaling, and protein regulation. Considering that identifying lncRNA's localization in the cell through experimental methods is complicated, hard to reproduce, and expensive, we propose a novel method named IDDLncLoc in this paper, which adopts an ensemble model to solve the problem of the subcellular localization. In the proposal model, dinucleotide-based auto-cross covariance features, k-mer nucleotide composition features, and composition, transition, and distribution features are introduced to encode a raw RNA sequence to vector. To screen out reliable features, feature selection through binomial distribution, and recursive feature elimination is employed. Furthermore, strategies of oversampling in mini-batch, random sampling, and stacking ensemble strategies are customized to overcome the problem of data imbalance on the benchmark dataset. Finally, compared with the latest methods, IDDLncLoc achieves an accuracy of 94.96% on the benchmark dataset, which is 2.59% higher than the best method, and the results further demonstrate IDDLncLoc is excellent on the subcellular localization of lncRNA. Besides, a user-friendly web server is available at http://lncloc.club .

摘要

长非编码 RNA 在细胞的许多生命过程中发挥着关键作用,例如遗传标记、RNA 剪接、信号转导和蛋白质调节。鉴于通过实验方法识别 lncRNA 在细胞中的定位复杂、难以重现且昂贵,我们在本文中提出了一种名为 IDDLncLoc 的新方法,该方法采用集成模型来解决亚细胞定位问题。在该提议模型中,采用二核苷酸自交叉协方差特征、k-mer 核苷酸组成特征以及组成、转换和分布特征将原始 RNA 序列编码为向量。为了筛选出可靠的特征,采用二项式分布和递归特征消除进行特征选择。此外,针对基准数据集的数据不平衡问题,定制了在 mini-batch 中进行过采样、随机抽样和堆叠集成策略。最后,与最新方法相比,IDDLncLoc 在基准数据集上的准确率达到 94.96%,比最佳方法高出 2.59%,结果进一步证明了 IDDLncLoc 在 lncRNA 的亚细胞定位方面表现出色。此外,还提供了一个用户友好的网络服务器,网址为 http://lncloc.club。

相似文献

1
IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions.IDDLncLoc:基于不平衡数据分布框架的 lncRNAs 亚细胞定位。
Interdiscip Sci. 2022 Jun;14(2):409-420. doi: 10.1007/s12539-021-00497-6. Epub 2022 Feb 22.
2
GM-lncLoc: LncRNAs subcellular localization prediction based on graph neural network with meta-learning.GM-lncLoc:基于图神经网络与元学习的 lncRNAs 亚细胞定位预测。
BMC Genomics. 2023 Jan 28;24(1):52. doi: 10.1186/s12864-022-09034-1.
3
LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion.LncLocation:基于长链非编码 RNA 的多源异质特征融合的高效亚细胞定位预测。
Int J Mol Sci. 2020 Oct 1;21(19):7271. doi: 10.3390/ijms21197271.
4
DeepLncLoc: a deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding.DeepLncLoc:一种基于子序列嵌入的深度学习框架,用于长非编码 RNA 亚细胞定位预测。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab360.
5
iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC.iLoc-lncRNA:通过将八聚体组成纳入广义 PseKNC 来预测 lncRNA 的亚细胞位置。
Bioinformatics. 2018 Dec 15;34(24):4196-4204. doi: 10.1093/bioinformatics/bty508.
6
KD-KLNMF: Identification of lncRNAs subcellular localization with multiple features and nonnegative matrix factorization.KD-KLNMF:基于多种特征和非负矩阵分解的 lncRNAs 亚细胞定位识别
Anal Biochem. 2020 Dec 1;610:113995. doi: 10.1016/j.ab.2020.113995. Epub 2020 Oct 17.
7
The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier.lncLocator:一种基于堆叠集成分类器的长非编码 RNA 亚细胞定位预测器。
Bioinformatics. 2018 Jul 1;34(13):2185-2194. doi: 10.1093/bioinformatics/bty085.
8
TACOS: a novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization.TACOS:一种用于准确预测细胞特异性长非编码 RNA 亚细胞定位的新方法。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac243.
9
SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions.SFPEL-LPI:基于序列的特征投影集成学习预测 LncRNA-蛋白质相互作用。
PLoS Comput Biol. 2018 Dec 11;14(12):e1006616. doi: 10.1371/journal.pcbi.1006616. eCollection 2018 Dec.
10
GraphLncLoc: long non-coding RNA subcellular localization prediction using graph convolutional networks based on sequence to graph transformation.GraphLncLoc:基于序列到图转换的图卷积网络预测长链非编码RNA亚细胞定位
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac565.

引用本文的文献

1
A Comprehensive Review on RNA Subcellular Localization Prediction.RNA亚细胞定位预测综述
ArXiv. 2025 Apr 24:arXiv:2504.17162v1.
2
LncLSTA: a versatile predictor unveiling subcellular localization of lncRNAs through long-short term attention.LncLSTA:一种通过长短期注意力揭示lncRNA亚细胞定位的多功能预测工具。
Bioinform Adv. 2024 Nov 22;5(1):vbae173. doi: 10.1093/bioadv/vbae173. eCollection 2025.
3
GP-HTNLoc: A graph prototype head-tail network-based model for multi-label subcellular localization prediction of ncRNAs.

本文引用的文献

1
The critical role of T cells in glucocorticoid-induced osteoporosis.T 细胞在糖皮质激素诱导性骨质疏松中的关键作用。
Cell Death Dis. 2020 Dec 14;12(1):45. doi: 10.1038/s41419-020-03249-4.
2
Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics.灵长类基因组的暗物质:卫星 DNA 重复序列及其进化动态。
Cells. 2020 Dec 18;9(12):2714. doi: 10.3390/cells9122714.
3
Locate-R: Subcellular localization of long non-coding RNAs using nucleotide compositions.Locate-R:基于核苷酸组成的长链非编码 RNA 亚细胞定位。
GP-HTNLoc:一种基于图原型头-尾网络的非编码RNA多标签亚细胞定位预测模型。
Comput Struct Biotechnol J. 2024 May 3;23:2034-2048. doi: 10.1016/j.csbj.2024.04.052. eCollection 2024 Dec.
4
LncRNAs in neuropsychiatric disorders and computational insights for their prediction.神经精神疾病中的长链非编码RNA及其预测的计算见解
Mol Biol Rep. 2022 Dec;49(12):11515-11534. doi: 10.1007/s11033-022-07819-x. Epub 2022 Sep 12.
Genomics. 2020 May;112(3):2583-2589. doi: 10.1016/j.ygeno.2020.02.011. Epub 2020 Feb 14.
4
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
5
LncRNA-HGBC stabilized by HuR promotes gallbladder cancer progression by regulating miR-502-3p/SET/AKT axis.LncRNA-HGBC 被 HuR 稳定后通过调节 miR-502-3p/SET/AKT 轴促进胆囊癌进展。
Mol Cancer. 2019 Nov 21;18(1):167. doi: 10.1186/s12943-019-1097-9.
6
The binding of lncRNA RP11-732M18.3 with 14-3-3 β/α accelerates p21 degradation and promotes glioma growth.长链非编码 RNA RP11-732M18.3 与 14-3-3β/α 的结合加速了 p21 的降解,促进了神经胶质瘤的生长。
EBioMedicine. 2019 Jul;45:58-69. doi: 10.1016/j.ebiom.2019.06.002. Epub 2019 Jun 13.
7
iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data.iLearn:一个集成平台和元学习者,用于 DNA、RNA 和蛋白质序列数据的特征工程、机器学习分析和建模。
Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041.
8
A Distance-Based Weighted Undersampling Scheme for Support Vector Machines and its Application to Imbalanced Classification.一种基于距离的支持向量机加权欠采样方案及其在不平衡分类中的应用。
IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):4152-4165. doi: 10.1109/TNNLS.2017.2755595. Epub 2017 Oct 25.
9
iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC.iLoc-lncRNA:通过将八聚体组成纳入广义 PseKNC 来预测 lncRNA 的亚细胞位置。
Bioinformatics. 2018 Dec 15;34(24):4196-4204. doi: 10.1093/bioinformatics/bty508.
10
ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides.ACPred-FL:一种基于序列的预测器,使用有效的特征表示来提高抗癌肽的预测能力。
Bioinformatics. 2018 Dec 1;34(23):4007-4016. doi: 10.1093/bioinformatics/bty451.