• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用随机森林和一种新的序列编码方法预测供体剪接位点。

Prediction of donor splice sites using random forest with a new sequence encoding approach.

作者信息

Meher Prabina Kumar, Sahu Tanmaya Kumar, Rao Atmakuri Ramakrishna

机构信息

Division of Statistical Genetics, Indian Agricultural Statistics Research Institute, New Delhi, 110 012 India.

Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, 110 012 India.

出版信息

BioData Min. 2016 Jan 22;9:4. doi: 10.1186/s13040-016-0086-4. eCollection 2016.

DOI:10.1186/s13040-016-0086-4
PMID:26807151
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4724119/
Abstract

BACKGROUND

Detection of splice sites plays a key role for predicting the gene structure and thus development of efficient analytical methods for splice site prediction is vital. This paper presents a novel sequence encoding approach based on the adjacent di-nucleotide dependencies in which the donor splice site motifs are encoded into numeric vectors. The encoded vectors are then used as input in Random Forest (RF), Support Vector Machines (SVM) and Artificial Neural Network (ANN), Bagging, Boosting, Logistic regression, kNN and Naïve Bayes classifiers for prediction of donor splice sites.

RESULTS

The performance of the proposed approach is evaluated on the donor splice site sequence data of Homo sapiens, collected from Homo Sapiens Splice Sites Dataset (HS3D). The results showed that RF outperformed all the considered classifiers. Besides, RF achieved higher prediction accuracy than the existing methods viz., MEM, MDD, WMM, MM1, NNSplice and SpliceView, while compared using an independent test dataset.

CONCLUSION

Based on the proposed approach, we have developed an online prediction server (MaLDoSS) to help the biological community in predicting the donor splice sites. The server is made freely available at http://cabgrid.res.in:8080/maldoss. Due to computational feasibility and high prediction accuracy, the proposed approach is believed to help in predicting the eukaryotic gene structure.

摘要

背景

剪接位点的检测对于预测基因结构起着关键作用,因此开发高效的剪接位点预测分析方法至关重要。本文提出了一种基于相邻二核苷酸依赖性的新型序列编码方法,其中供体剪接位点基序被编码为数字向量。然后将编码后的向量用作随机森林(RF)、支持向量机(SVM)、人工神经网络(ANN)、装袋法、提升法、逻辑回归、k近邻和朴素贝叶斯分类器的输入,用于预测供体剪接位点。

结果

在所提出的方法的性能在从智人剪接位点数据集(HS3D)收集的智人的供体剪接位点序列数据上进行了评估。结果表明,随机森林在所有考虑的分类器中表现最佳。此外,在使用独立测试数据集进行比较时,随机森林比现有方法即MEM、MDD、WMM、MM1、NNSplice和SpliceView实现了更高的预测准确率。

结论

基于所提出的方法,我们开发了一个在线预测服务器(MaLDoSS),以帮助生物界预测供体剪接位点。该服务器可在http://cabgrid.res.in:8080/maldoss免费获取。由于计算可行性和高预测准确率,所提出的方法被认为有助于预测真核基因结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/1226197f4b4b/13040_2016_86_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/0c43e3a20c5f/13040_2016_86_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/047e5d3ca6f5/13040_2016_86_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/3112cd8aed91/13040_2016_86_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/2bbc3ea0f58a/13040_2016_86_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/fde22242a307/13040_2016_86_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/c4b37f860840/13040_2016_86_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/6d0d17121db9/13040_2016_86_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/1f575baebbdb/13040_2016_86_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/d011f03dd314/13040_2016_86_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/9ab695b4b7ba/13040_2016_86_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/87a7fe7d3bb7/13040_2016_86_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/88fbe9c567c2/13040_2016_86_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/9271acabf5e9/13040_2016_86_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/1226197f4b4b/13040_2016_86_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/0c43e3a20c5f/13040_2016_86_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/047e5d3ca6f5/13040_2016_86_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/3112cd8aed91/13040_2016_86_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/2bbc3ea0f58a/13040_2016_86_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/fde22242a307/13040_2016_86_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/c4b37f860840/13040_2016_86_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/6d0d17121db9/13040_2016_86_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/1f575baebbdb/13040_2016_86_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/d011f03dd314/13040_2016_86_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/9ab695b4b7ba/13040_2016_86_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/87a7fe7d3bb7/13040_2016_86_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/88fbe9c567c2/13040_2016_86_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/9271acabf5e9/13040_2016_86_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ed/4724119/1226197f4b4b/13040_2016_86_Fig14_HTML.jpg

相似文献

1
Prediction of donor splice sites using random forest with a new sequence encoding approach.使用随机森林和一种新的序列编码方法预测供体剪接位点。
BioData Min. 2016 Jan 22;9:4. doi: 10.1186/s13040-016-0086-4. eCollection 2016.
2
A computational approach for prediction of donor splice sites with improved accuracy.一种提高准确性的预测供体剪接位点的计算方法。
J Theor Biol. 2016 Sep 7;404:285-294. doi: 10.1016/j.jtbi.2016.06.013. Epub 2016 Jun 11.
3
A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data.一种使用短序列基序且无需编码序列数据来预测5'剪接位点的统计方法。
BMC Bioinformatics. 2014 Nov 25;15:362. doi: 10.1186/s12859-014-0362-6.
4
Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition.评估序列编码方案和机器学习方法在剪接位点识别中的性能。
Gene. 2019 Jul 15;705:113-126. doi: 10.1016/j.gene.2019.04.047. Epub 2019 Apr 19.
5
Markovian encoding models in human splice site recognition using SVM.使用支持向量机的人类剪接位点识别中的马尔可夫编码模型
Comput Biol Chem. 2018 Apr;73:159-170. doi: 10.1016/j.compbiolchem.2018.02.005. Epub 2018 Feb 14.
6
Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features.使用支持向量机识别供体剪接位点:一种基于位置、组成和依赖性特征的计算方法。
Algorithms Mol Biol. 2016 Jun 1;11:16. doi: 10.1186/s13015-016-0078-4. eCollection 2016.
7
Improved recognition of splice sites in by incorporating secondary structure information into sequence-derived features: a computational study.通过将二级结构信息纳入序列衍生特征来提高对剪接位点的识别:一项计算研究。
3 Biotech. 2021 Nov;11(11):484. doi: 10.1007/s13205-021-03036-8. Epub 2021 Oct 31.
8
A high-performance approach for predicting donor splice sites based on short window size and imbalanced large samples.基于短窗口大小和不平衡大样本的供体剪接位点预测的高性能方法。
Biol Direct. 2019 Apr 11;14(1):6. doi: 10.1186/s13062-019-0236-y.
9
A novel method for splice sites prediction using sequence component and hidden Markov model.一种使用序列成分和隐马尔可夫模型进行剪接位点预测的新方法。
Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug;2016:3076-3079. doi: 10.1109/EMBC.2016.7591379.
10
Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA.Splice2Deep:用于改进基因组DNA中剪接位点预测的深度卷积神经网络集成方法。
Gene X. 2020 May 13;5:100035. doi: 10.1016/j.gene.2020.100035. eCollection 2020 Dec.

引用本文的文献

1
Development of a Peptide-Based Multiepitope Vaccine from the SARS-CoV-2 Spike Protein for Targeted Immune Response Against COVID-19.基于严重急性呼吸综合征冠状病毒2(SARS-CoV-2)刺突蛋白的多表位肽疫苗的研发,用于针对2019冠状病毒病(COVID-19)的靶向免疫反应
Protein Pept Lett. 2025;32(4):299-311. doi: 10.2174/0109298665364226250328084245.
2
A hybrid approach of ensemble learning and grey wolf optimizer for DNA splice junction prediction.基于集成学习和灰狼优化算法的混合方法进行 DNA 剪接位点预测。
PLoS One. 2024 Sep 23;19(9):e0310698. doi: 10.1371/journal.pone.0310698. eCollection 2024.
3
DRANetSplicer: A Splice Site Prediction Model Based on Deep Residual Attention Networks.

本文引用的文献

1
High-accuracy splice site prediction based on sequence component and position features.基于序列成分和位置特征的高精度剪接位点预测
Genet Mol Res. 2012 Sep 25;11(3):3432-51. doi: 10.4238/2012.September.25.12.
2
Predicting disease risks from highly imbalanced data using random forest.基于随机森林算法从高度不平衡数据中预测疾病风险。
BMC Med Inform Decis Mak. 2011 Jul 29;11:51. doi: 10.1186/1472-6947-11-51.
3
A novel role for minimal introns: routing mRNAs to the cytosol.内含子最小化的新作用:将 mRNA 导向细胞质。
DRANetSplicer:一种基于深度残差注意力网络的剪接位点预测模型。
Genes (Basel). 2024 Mar 26;15(4):404. doi: 10.3390/genes15040404.
4
Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition.利用支持向量机和二肽组成预测玉米体内母性单倍体诱导的母系特异性类脂酶蛋白。
Amino Acids. 2024 Mar 9;56(1):20. doi: 10.1007/s00726-023-03368-0.
5
Apoptin NLS2 homodimerization strategy for improved antibacterial activity and bio-stability.Apoptin NLS2 同源二聚化策略可提高抗菌活性和生物稳定性。
Amino Acids. 2023 Oct;55(10):1405-1416. doi: 10.1007/s00726-023-03321-1. Epub 2023 Sep 19.
6
An automated framework for evaluation of deep learning models for splice site predictions.用于评估深度学习模型进行剪接位点预测的自动化框架。
Sci Rep. 2023 Jun 23;13(1):10221. doi: 10.1038/s41598-023-34795-4.
7
Spliceator: multi-species splice site prediction using convolutional neural networks.Spliceator:使用卷积神经网络进行多物种剪接位点预测。
BMC Bioinformatics. 2021 Nov 23;22(1):561. doi: 10.1186/s12859-021-04471-3.
8
Improved recognition of splice sites in by incorporating secondary structure information into sequence-derived features: a computational study.通过将二级结构信息纳入序列衍生特征来提高对剪接位点的识别:一项计算研究。
3 Biotech. 2021 Nov;11(11):484. doi: 10.1007/s13205-021-03036-8. Epub 2021 Oct 31.
9
mLoc-mRNA: predicting multiple sub-cellular localization of mRNAs using random forest algorithm coupled with feature selection via elastic net.mLoc-mRNA:使用随机森林算法和弹性网络特征选择预测 mRNAs 的多个亚细胞定位。
BMC Bioinformatics. 2021 Jun 24;22(1):342. doi: 10.1186/s12859-021-04264-8.
10
DASSI: differential architecture search for splice identification from DNA sequences.DASSI:用于从DNA序列中识别剪接的差异架构搜索
BioData Min. 2021 Feb 15;14(1):15. doi: 10.1186/s13040-021-00237-y.
PLoS One. 2010 Apr 12;5(4):e10144. doi: 10.1371/journal.pone.0010144.
4
Supervised machine learning algorithms for protein structure classification.用于蛋白质结构分类的监督式机器学习算法。
Comput Biol Chem. 2009 Jun;33(3):216-23. doi: 10.1016/j.compbiolchem.2009.04.004. Epub 2009 May 3.
5
Prediction of glycosylation sites using random forests.使用随机森林预测糖基化位点。
BMC Bioinformatics. 2008 Nov 27;9:500. doi: 10.1186/1471-2105-9-500.
6
Accurate splice site prediction using support vector machines.使用支持向量机进行精确的剪接位点预测。
BMC Bioinformatics. 2007;8 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2105-8-S10-S7.
7
Features of 5'-splice-site efficiency derived from disease-causing mutations and comparative genomics.源自致病突变和比较基因组学的5'-剪接位点效率特征。
Genome Res. 2008 Jan;18(1):77-87. doi: 10.1101/gr.6859308. Epub 2007 Nov 21.
8
Splice site identification using probabilistic parameters and SVM classification.使用概率参数和支持向量机分类进行剪接位点识别。
BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S15. doi: 10.1186/1471-2105-7-S5-S15.
9
Markov encoding for detecting signals in genomic sequences.用于检测基因组序列中信号的马尔可夫编码
IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):131-42. doi: 10.1109/TCBB.2005.27.
10
Comprehensive splice-site analysis using comparative genomics.使用比较基因组学进行全面的剪接位点分析。
Nucleic Acids Res. 2006;34(14):3955-67. doi: 10.1093/nar/gkl556. Epub 2006 Aug 12.