• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用支持向量机自动检测外显子剪接增强子(ESEs)。

Automatic detection of exonic splicing enhancers (ESEs) using SVMs.

作者信息

Mersch Britta, Gepperth Alexander, Suhai Sándor, Hotz-Wagenblatt Agnes

机构信息

Department of Molecular Biophysics, German Cancer Research Center DKFZ, Im Neuenheimer Feld 580, Heidelberg, Germany.

出版信息

BMC Bioinformatics. 2008 Sep 10;9:369. doi: 10.1186/1471-2105-9-369.

DOI:10.1186/1471-2105-9-369
PMID:18783607
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2567995/
Abstract

BACKGROUND

Exonic splicing enhancers (ESEs) activate nearby splice sites and promote the inclusion (vs. exclusion) of exons in which they reside, while being a binding site for SR proteins. To study the impact of ESEs on alternative splicing it would be useful to have a possibility to detect them in exons. Identifying SR protein-binding sites in human DNA sequences by machine learning techniques is a formidable task, since the exon sequences are also constrained by their functional role in coding for proteins.

RESULTS

The choice of training examples needed for machine learning approaches is difficult since there are only few exact locations of human ESEs described in the literature which could be considered as positive examples. Additionally, it is unclear which sequences are suitable as negative examples. Therefore, we developed a motif-oriented data-extraction method that extracts exon sequences around experimentally or theoretically determined ESE patterns. Positive examples are restricted by heuristics based on known properties of ESEs, e.g. location in the vicinity of a splice site, whereas negative examples are taken in the same way from the middle of long exons. We show that a suitably chosen SVM using optimized sequence kernels (e.g., combined oligo kernel) can extract meaningful properties from these training examples. Once the classifier is trained, every potential ESE sequence can be passed to the SVM for verification. Using SVMs with the combined oligo kernel yields a high accuracy of about 90 percent and well interpretable parameters.

CONCLUSION

The motif-oriented data-extraction method seems to produce consistent training and test data leading to good classification rates and thus allows verification of potential ESE motifs. The best results were obtained using an SVM with the combined oligo kernel, while oligo kernels with oligomers of a certain length could be used to extract relevant features.

摘要

背景

外显子剪接增强子(ESEs)可激活附近的剪接位点,并促进其所驻留外显子的包含(相对于排除),同时作为SR蛋白的结合位点。为了研究ESEs对可变剪接的影响,若能在外显子中检测到它们将很有用。通过机器学习技术在人类DNA序列中识别SR蛋白结合位点是一项艰巨的任务,因为外显子序列还受到其在蛋白质编码中的功能作用的限制。

结果

由于文献中描述的人类ESEs的精确位置很少,可被视为正例,因此机器学习方法所需训练示例的选择很困难。此外,尚不清楚哪些序列适合作为负例。因此,我们开发了一种基于基序的数据提取方法,该方法可提取实验或理论确定的ESE模式周围的外显子序列。正例通过基于ESEs已知特性的启发式方法进行限制,例如位于剪接位点附近,而负例则以相同方式从长外显子的中间获取。我们表明,使用优化的序列核(例如组合寡核苷酸核)的适当选择的支持向量机(SVM)可以从这些训练示例中提取有意义的特性。一旦训练了分类器,每个潜在的ESE序列都可以传递给SVM进行验证。使用具有组合寡核苷酸核的SVM可产生约90%的高精度和易于解释的参数。

结论

基于基序的数据提取方法似乎能产生一致的训练和测试数据,从而获得良好的分类率,因此可以验证潜在的ESE基序。使用具有组合寡核苷酸核的SVM可获得最佳结果,而具有特定长度寡聚物的寡核苷酸核可用于提取相关特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/1e9176c3d725/1471-2105-9-369-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/1b23ef6562a0/1471-2105-9-369-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/39e6cefacf10/1471-2105-9-369-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/ecb548553c1e/1471-2105-9-369-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/6808c7e259e5/1471-2105-9-369-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/427ae21cd39c/1471-2105-9-369-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/fd1db0cb6e44/1471-2105-9-369-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/1e9176c3d725/1471-2105-9-369-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/1b23ef6562a0/1471-2105-9-369-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/39e6cefacf10/1471-2105-9-369-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/ecb548553c1e/1471-2105-9-369-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/6808c7e259e5/1471-2105-9-369-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/427ae21cd39c/1471-2105-9-369-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/fd1db0cb6e44/1471-2105-9-369-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5e2/2567995/1e9176c3d725/1471-2105-9-369-7.jpg

相似文献

1
Automatic detection of exonic splicing enhancers (ESEs) using SVMs.使用支持向量机自动检测外显子剪接增强子(ESEs)。
BMC Bioinformatics. 2008 Sep 10;9:369. doi: 10.1186/1471-2105-9-369.
2
Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns.脊椎动物内含子长度依赖性剪接位点与外显子剪接信号之间的补偿关系。
BMC Genomics. 2006 Dec 8;7:311. doi: 10.1186/1471-2164-7-311.
3
Exonic splicing enhancers contribute to the use of both 3' and 5' splice site usage of rat beta-tropomyosin pre-mRNA.外显子剪接增强子有助于大鼠β-原肌球蛋白前体mRNA的3'和5'剪接位点的使用。
RNA. 1999 Mar;5(3):378-94. doi: 10.1017/s1355838299981050.
4
Evidence for deep phylogenetic conservation of exonic splice-related constraints: splice-related skews at exonic ends in the brown alga Ectocarpus are common and resemble those seen in humans.外显子剪接相关约束的深度系统发育保守性的证据:褐藻外生疣孢藻中外显子末端与剪接相关的偏倚很常见,类似于人类中观察到的那些。
Genome Biol Evol. 2013;5(9):1731-45. doi: 10.1093/gbe/evt115.
5
Distribution of SR protein exonic splicing enhancer motifs in human protein-coding genes.SR蛋白外显子剪接增强子基序在人类蛋白质编码基因中的分布。
Nucleic Acids Res. 2005 Sep 7;33(16):5053-62. doi: 10.1093/nar/gki810. Print 2005.
6
Inference of splicing regulatory activities by sequence neighborhood analysis.通过序列邻域分析推断剪接调控活性
PLoS Genet. 2006 Nov 24;2(11):e191. doi: 10.1371/journal.pgen.0020191. Epub 2006 Sep 28.
7
Identification of functional exonic splicing enhancer motifs recognized by individual SR proteins.鉴定由单个SR蛋白识别的功能性外显子剪接增强子基序。
Genes Dev. 1998 Jul 1;12(13):1998-2012. doi: 10.1101/gad.12.13.1998.
8
Distribution of exonic splicing enhancer elements in human genes.外显子剪接增强子元件在人类基因中的分布。
Genomics. 2005 Sep;86(3):329-36. doi: 10.1016/j.ygeno.2005.05.011.
9
A strong exonic splicing enhancer in dystrophin exon 19 achieve proper splicing without an upstream polypyrimidine tract.肌营养不良蛋白第19外显子中的一个强大的外显子剪接增强子在没有上游多嘧啶序列的情况下实现了正确剪接。
J Biochem. 2008 Mar;143(3):303-10. doi: 10.1093/jb/mvm227. Epub 2007 Nov 26.
10
Purifying Selection on Exonic Splice Enhancers in Intronless Genes.无内含子基因中外显子剪接增强子的纯化选择
Mol Biol Evol. 2016 Jun;33(6):1396-418. doi: 10.1093/molbev/msw018. Epub 2016 Jan 23.

引用本文的文献

1
The role of complement factor I rare genetic variants in age related macular degeneration in Finland.补体因子I罕见基因变异在芬兰年龄相关性黄斑变性中的作用。
Hum Mol Genet. 2025 Feb 1;34(3):218-228. doi: 10.1093/hmg/ddae165.
2
A new mechanism for a familiar mutation - bovine DGAT1 K232A modulates gene expression through multi-junction exon splice enhancement.一种新的机制解释了一个熟悉的突变——牛 DGAT1 K232A 通过多接头外显子剪接增强来调节基因表达。
BMC Genomics. 2020 Aug 26;21(1):591. doi: 10.1186/s12864-020-07004-z.
3
DDX54 regulates transcriptome dynamics during DNA damage response.

本文引用的文献

1
Evolutionary optimization of sequence kernels for detection of bacterial gene starts.
Int J Neural Syst. 2007 Oct;17(5):369-81. doi: 10.1142/S0129065707001214.
2
A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana.模式植物拟南芥中候选外显子剪接增强子基序的计算研究。
BMC Bioinformatics. 2007 May 21;8:159. doi: 10.1186/1471-2105-8-159.
3
Gradient-based optimization of kernel-target alignment for sequence kernels applied to bacterial gene start detection.基于梯度的序列核的核目标对齐优化在细菌基因起始检测中的应用
DDX54 调控 DNA 损伤应答过程中的转录组动力学。
Genome Res. 2017 Aug;27(8):1344-1359. doi: 10.1101/gr.218438.116. Epub 2017 Jun 8.
4
Improving genetic diagnosis in Mendelian disease with transcriptome sequencing.通过转录组测序改善孟德尔疾病的基因诊断。
Sci Transl Med. 2017 Apr 19;9(386). doi: 10.1126/scitranslmed.aal5209.
5
Genome-wide prediction of splice-modifying SNPs in human genes using a new analysis pipeline called AASsites.利用名为 AASsites 的新分析管道,对人类基因中的剪接修饰 SNPs 进行全基因组预测。
BMC Bioinformatics. 2011;12 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-12-S4-S2. Epub 2011 Jul 5.
6
Genetic analysis of complement factor H related 5, CFHR5, in patients with age-related macular degeneration.年龄相关性黄斑变性患者中补体因子H相关蛋白5(CFHR5)的基因分析。
Mol Vis. 2009;15:731-6. Epub 2009 Apr 10.
IEEE/ACM Trans Comput Biol Bioinform. 2007 Apr-Jun;4(2):216-26. doi: 10.1109/TCBB.2007.070208.
4
Markov encoding for detecting signals in genomic sequences.用于检测基因组序列中信号的马尔可夫编码
IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):131-42. doi: 10.1109/TCBB.2005.27.
5
A machine learning strategy to identify candidate binding sites in human protein-coding sequence.一种用于识别人类蛋白质编码序列中候选结合位点的机器学习策略。
BMC Bioinformatics. 2006 Sep 26;7:419. doi: 10.1186/1471-2105-7-419.
6
Using RNA secondary structures to guide sequence motif finding towards single-stranded regions.利用RNA二级结构引导在单链区域寻找序列基序。
Nucleic Acids Res. 2006;34(17):e117. doi: 10.1093/nar/gkl544. Epub 2006 Sep 20.
7
RNA sequence and secondary structure participate in high-affinity CsrA-RNA interaction.RNA序列和二级结构参与CsrA-RNA的高亲和力相互作用。
RNA. 2005 Oct;11(10):1579-87. doi: 10.1261/rna.2990205. Epub 2005 Aug 30.
8
The RNA ligands for mouse proline-rich RNA-binding protein (mouse Prrp) contain two consensus sequences in separate loop structure.小鼠富含脯氨酸的RNA结合蛋白(小鼠Prrp)的RNA配体在不同的环结构中包含两个共有序列。
Nucleic Acids Res. 2005 Jan 12;33(1):190-200. doi: 10.1093/nar/gki153. Print 2005.
9
The Vertebrate Genome Annotation (Vega) database.脊椎动物基因组注释(Vega)数据库。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D459-65. doi: 10.1093/nar/gki135.
10
Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites.用于生物序列数据挖掘的寡核苷酸内核:以原核生物翻译起始位点为例的研究
BMC Bioinformatics. 2004 Oct 28;5:169. doi: 10.1186/1471-2105-5-169.