• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SpliceFinder:使用卷积神经网络进行剪接位点的从头预测。

SpliceFinder: ab initio prediction of splice sites using convolutional neural network.

机构信息

Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, China.

出版信息

BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):652. doi: 10.1186/s12859-019-3306-3.

DOI:10.1186/s12859-019-3306-3
PMID:31881982
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6933889/
Abstract

BACKGROUND

Identifying splice sites is a necessary step to analyze the location and structure of genes. Two dinucleotides, GT and AG, are highly frequent on splice sites, and many other patterns are also on splice sites with important biological functions. Meanwhile, the dinucleotides occur frequently at the sequences without splice sites, which makes the prediction prone to generate false positives. Most existing tools select all the sequences with the two dimers and then focus on distinguishing the true splice sites from those pseudo ones. Such an approach will lead to a decrease in false positives; however, it will result in non-canonical splice sites missing.

RESULT

We have designed SpliceFinder based on convolutional neural network (CNN) to predict splice sites. To achieve the ab initio prediction, we used human genomic data to train our neural network. An iterative approach is adopted to reconstruct the dataset, which tackles the data unbalance problem and forces the model to learn more features of splice sites. The proposed CNN obtains the classification accuracy of 90.25%, which is 10% higher than the existing algorithms. The method outperforms other existing methods in terms of area under receiver operating characteristics (AUC), recall, precision, and F1 score. Furthermore, SpliceFinder can find the exact position of splice sites on long genomic sequences with a sliding window. Compared with other state-of-the-art splice site prediction tools, SpliceFinder generates results in about half lower false positive while keeping recall higher than 0.8. Also, SpliceFinder captures the non-canonical splice sites. In addition, SpliceFinder performs well on the genomic sequences of Drosophila melanogaster, Mus musculus, Rattus, and Danio rerio without retraining.

CONCLUSION

Based on CNN, we have proposed a new ab initio splice site prediction tool, SpliceFinder, which generates less false positives and can detect non-canonical splice sites. Additionally, SpliceFinder is transferable to other species without retraining. The source code and additional materials are available at https://gitlab.deepomics.org/wangruohan/SpliceFinder.

摘要

背景

鉴定剪接位点是分析基因位置和结构的必要步骤。两个二核苷酸 GT 和 AG 在剪接位点高度频繁出现,许多其他模式也存在于具有重要生物学功能的剪接位点上。同时,这些二核苷酸在没有剪接位点的序列中也频繁出现,这使得预测容易产生假阳性。大多数现有的工具选择所有包含这两个二聚体的序列,然后专注于区分真正的剪接位点和那些伪剪接位点。这种方法会降低假阳性率;然而,它也会导致非规范剪接位点的缺失。

结果

我们基于卷积神经网络(CNN)设计了 SpliceFinder 来预测剪接位点。为了实现从头预测,我们使用人类基因组数据来训练我们的神经网络。我们采用迭代方法来重构数据集,这解决了数据不平衡的问题,并迫使模型学习更多的剪接位点特征。所提出的 CNN 获得了 90.25%的分类准确率,比现有的算法高出 10%。该方法在接收者操作特征(AUC)、召回率、精度和 F1 评分方面均优于其他现有方法。此外,SpliceFinder 可以在长基因组序列上使用滑动窗口找到剪接位点的精确位置。与其他最先进的剪接位点预测工具相比,SpliceFinder 在保持召回率高于 0.8 的同时,产生的假阳性率低一半。此外,SpliceFinder 还能捕获非规范的剪接位点。此外,SpliceFinder 在无需重新训练的情况下,对果蝇、小鼠、大鼠和斑马鱼的基因组序列也能很好地发挥作用。

结论

基于 CNN,我们提出了一种新的从头预测剪接位点的工具 SpliceFinder,它产生的假阳性较少,并且可以检测非规范的剪接位点。此外,SpliceFinder 无需重新训练即可转移到其他物种。源代码和其他材料可在 https://gitlab.deepomics.org/wangruohan/SpliceFinder 上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/46cc48befdb9/12859_2019_3306_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/8aa5dc232fab/12859_2019_3306_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/f1907b2326b2/12859_2019_3306_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/0f5c9acf01f1/12859_2019_3306_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/d0ccb0b536c8/12859_2019_3306_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/80f081f439aa/12859_2019_3306_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/9c018fac8e9a/12859_2019_3306_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/0dce337a5fb3/12859_2019_3306_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/db78c18ae478/12859_2019_3306_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/46cc48befdb9/12859_2019_3306_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/8aa5dc232fab/12859_2019_3306_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/f1907b2326b2/12859_2019_3306_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/0f5c9acf01f1/12859_2019_3306_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/d0ccb0b536c8/12859_2019_3306_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/80f081f439aa/12859_2019_3306_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/9c018fac8e9a/12859_2019_3306_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/0dce337a5fb3/12859_2019_3306_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/db78c18ae478/12859_2019_3306_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d64c/6933889/46cc48befdb9/12859_2019_3306_Fig9_HTML.jpg

相似文献

1
SpliceFinder: ab initio prediction of splice sites using convolutional neural network.SpliceFinder:使用卷积神经网络进行剪接位点的从头预测。
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):652. doi: 10.1186/s12859-019-3306-3.
2
Analysis of canonical and non-canonical splice sites in mammalian genomes.哺乳动物基因组中典型和非典型剪接位点的分析。
Nucleic Acids Res. 2000 Nov 1;28(21):4364-75. doi: 10.1093/nar/28.21.4364.
3
Human Splice-Site Prediction with Deep Neural Networks.利用深度神经网络进行人类剪接位点预测
J Comput Biol. 2018 Aug;25(8):954-961. doi: 10.1089/cmb.2018.0041. Epub 2018 Apr 18.
4
Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.读取-分割-运行:一种利用RNA测序数据识别全基因组非经典剪接区域的改进型生物信息学流程。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7.
5
EDeepSSP: Explainable deep neural networks for exact splice sites prediction.EDeepSSP:用于准确剪接位点预测的可解释深度神经网络。
J Bioinform Comput Biol. 2020 Aug;18(4):2050024. doi: 10.1142/S0219720020500249. Epub 2020 Jul 22.
6
DRANetSplicer: A Splice Site Prediction Model Based on Deep Residual Attention Networks.DRANetSplicer:一种基于深度残差注意力网络的剪接位点预测模型。
Genes (Basel). 2024 Mar 26;15(4):404. doi: 10.3390/genes15040404.
7
Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA.Splice2Deep:用于改进基因组DNA中剪接位点预测的深度卷积神经网络集成方法。
Gene. 2020 Dec;763S:100035. doi: 10.1016/j.gene.2020.100035. Epub 2020 May 13.
8
Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants.纳入非规范内含子的剪接位点概率模型可改善植物基因结构预测。
Bioinformatics. 2005 Nov 1;21 Suppl 3:iii20-30. doi: 10.1093/bioinformatics/bti1205.
9
Splice site prediction with quadratic discriminant analysis using diversity measure.使用多样性度量的二次判别分析进行剪接位点预测。
Nucleic Acids Res. 2003 Nov 1;31(21):6214-20. doi: 10.1093/nar/gkg805.
10
Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition.评估序列编码方案和机器学习方法在剪接位点识别中的性能。
Gene. 2019 Jul 15;705:113-126. doi: 10.1016/j.gene.2019.04.047. Epub 2019 Apr 19.

引用本文的文献

1
The Impact of Tokenizer Selection in Genomic Language Models.基因组语言模型中分词器选择的影响
bioRxiv. 2025 Jul 26:2024.09.09.612081. doi: 10.1101/2024.09.09.612081.
2
Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling.墨丘利神杖:双向等变远程DNA序列建模
Proc Mach Learn Res. 2024 Jul;235:43632-43648.
3
Progress and opportunities of foundation models in bioinformatics.生物信息学中基础模型的进展与机遇。

本文引用的文献

1
Predicting Splicing from Primary Sequence with Deep Learning.深度学习预测剪接。
Cell. 2019 Jan 24;176(3):535-548.e24. doi: 10.1016/j.cell.2018.12.015. Epub 2019 Jan 17.
2
Promoter analysis and prediction in the human genome using sequence-based deep learning models.基于序列的深度学习模型在人类基因组中的启动子分析和预测。
Bioinformatics. 2019 Aug 15;35(16):2730-2737. doi: 10.1093/bioinformatics/bty1068.
3
SpliceRover: interpretable convolutional neural networks for improved splice site prediction.SpliceRover:用于提高剪接位点预测的可解释卷积神经网络。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae548.
4
A hybrid approach of ensemble learning and grey wolf optimizer for DNA splice junction prediction.基于集成学习和灰狼优化算法的混合方法进行 DNA 剪接位点预测。
PLoS One. 2024 Sep 23;19(9):e0310698. doi: 10.1371/journal.pone.0310698. eCollection 2024.
5
Splam: a deep-learning-based splice site predictor that improves spliced alignments.Splam:一种基于深度学习的剪接位点预测器,可提高剪接对齐。
Genome Biol. 2024 Sep 16;25(1):243. doi: 10.1186/s13059-024-03379-4.
6
DRANetSplicer: A Splice Site Prediction Model Based on Deep Residual Attention Networks.DRANetSplicer:一种基于深度残差注意力网络的剪接位点预测模型。
Genes (Basel). 2024 Mar 26;15(4):404. doi: 10.3390/genes15040404.
7
Reduced MUNC18-1 Levels, Synaptic Proteome Changes, and Altered Network Activity in -Related Disorder Patient Neurons.与疾病相关的患者神经元中MUNC18-1水平降低、突触蛋白质组变化及网络活动改变
Biol Psychiatry Glob Open Sci. 2023 May 30;4(1):284-298. doi: 10.1016/j.bpsgos.2023.05.004. eCollection 2024 Jan.
8
Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing.改进 RNA 剪接神经网络模型中 RNA 结合蛋白基序的可解释建模。
Genome Biol. 2024 Jan 16;25(1):23. doi: 10.1186/s13059-023-03162-x.
9
Applications for Deep Learning in Epilepsy Genetic Research.深度学习在癫痫遗传学研究中的应用。
Int J Mol Sci. 2023 Sep 27;24(19):14645. doi: 10.3390/ijms241914645.
10
Splam: a deep-learning-based splice site predictor that improves spliced alignments.Splam:一种基于深度学习的剪接位点预测器,可改善剪接比对。
bioRxiv. 2023 Jul 29:2023.07.27.550754. doi: 10.1101/2023.07.27.550754.
Bioinformatics. 2018 Dec 15;34(24):4180-4188. doi: 10.1093/bioinformatics/bty497.
4
ORCAE: online resource for community annotation of eukaryotes.ORCAE:真核生物社区注释在线资源。
Nat Methods. 2012 Nov;9(11):1041. doi: 10.1038/nmeth.2242.
5
MapSplice: accurate mapping of RNA-seq reads for splice junction discovery.MapSplice:用于剪接位点发现的 RNA-seq 读段的精确映射。
Nucleic Acids Res. 2010 Oct;38(18):e178. doi: 10.1093/nar/gkq622. Epub 2010 Aug 27.
6
Alternative splicing and evolution: diversification, exon definition and function.可变剪接与进化:多样化、外显子定义与功能。
Nat Rev Genet. 2010 May;11(5):345-55. doi: 10.1038/nrg2776. Epub 2010 Apr 8.
7
Detection of splice junctions from paired-end RNA-seq data by SpliceMap.通过 SpliceMap 从 RNA-seq 数据的配对末端检测剪接接头。
Nucleic Acids Res. 2010 Aug;38(14):4570-8. doi: 10.1093/nar/gkq211. Epub 2010 Apr 5.
8
TopHat: discovering splice junctions with RNA-Seq.TopHat:利用RNA测序发现剪接接头
Bioinformatics. 2009 May 1;25(9):1105-11. doi: 10.1093/bioinformatics/btp120. Epub 2009 Mar 16.
9
Accurate splice site prediction using support vector machines.使用支持向量机进行精确的剪接位点预测。
BMC Bioinformatics. 2007;8 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2105-8-S10-S7.
10
Improving the Caenorhabditis elegans genome annotation using machine learning.利用机器学习改进秀丽隐杆线虫基因组注释
PLoS Comput Biol. 2007 Feb 23;3(2):e20. doi: 10.1371/journal.pcbi.0030020. Epub 2006 Dec 21.