通过idlBNs进行剪接位点识别。

Splice site identification by idlBNs.

作者信息

Castelo Robert, Guigó Roderic

机构信息

Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Centre de Regulació Genòmica, Psg. Marítim 37-49, Barcelona, Spain.

出版信息

Bioinformatics. 2004 Aug 4;20 Suppl 1:i69-76. doi: 10.1093/bioinformatics/bth932.

DOI:10.1093/bioinformatics/bth932

PMID:15262783

Abstract

MOTIVATION

Computational identification of functional sites in nucleotide sequences is at the core of many algorithms for the analysis of genomic data. This identification is based on the statistical parameters estimated from a training set. Often, because of the huge number of parameters, it is difficult to obtain consistent estimators. To simplify the estimation problem, one imposes independent assumptions between the nucleotides along the site. However, this can potentially limit the minimum value of the estimation error.

RESULTS

In this paper, we introduce a novel method in the context of identifying functional sites, that finds a reasonable set of independence assumptions supported by the data, among the nucleotides, and uses it to perform the identification of the sites by their likelihood ratio. More importantly, in many practical situations it is capable of improving its performance as the training sample size increases. We apply the method to the identification of splice sites, and further evaluate its effect within the context of exon and gene prediction.

摘要

动机

核苷酸序列中功能位点的计算识别是许多基因组数据分析算法的核心。这种识别基于从训练集中估计的统计参数。通常，由于参数数量众多，很难获得一致的估计量。为了简化估计问题，人们对位点上的核苷酸之间施加独立假设。然而，这可能会潜在地限制估计误差的最小值。

结果

在本文中，我们在识别功能位点的背景下引入了一种新方法，该方法在核苷酸之间找到一组由数据支持的合理独立假设，并利用它通过似然比来识别位点。更重要的是，在许多实际情况下，随着训练样本量的增加，它能够提高其性能。我们将该方法应用于剪接位点的识别，并在基因外显子和基因预测的背景下进一步评估其效果。

相似文献

Splice site identification by idlBNs.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i69-76. doi: 10.1093/bioinformatics/bth932.

Identification of coding and non-coding sequences using local Holder exponent formalism.

Bioinformatics. 2005 Oct 15;21(20):3818-23. doi: 10.1093/bioinformatics/bti639. Epub 2005 Aug 23.

Pattern locator: a new tool for finding local sequence patterns in genomic DNA sequences.

Bioinformatics. 2006 Dec 15;22(24):3099-100. doi: 10.1093/bioinformatics/btl551. Epub 2006 Nov 8.

Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.

BMC Bioinformatics. 2006 Mar 16;7:143. doi: 10.1186/1471-2105-7-143.

Generalized hierarchical markov models for the discovery of length-constrained sequence features from genome tiling arrays.

Biometrics. 2007 Sep;63(3):797-805. doi: 10.1111/j.1541-0420.2007.00760.x.

PEAKS: identification of regulatory motifs by their position in DNA sequences.

Bioinformatics. 2007 Jan 15;23(2):243-4. doi: 10.1093/bioinformatics/btl568. Epub 2006 Nov 10.

On counting position weight matrix matches in a sequence, with application to discriminative motif finding.

Bioinformatics. 2006 Jul 15;22(14):e454-63. doi: 10.1093/bioinformatics/btl227.

FunSiP: a modular and extensible classifier for the prediction of functional sites in DNA.

Bioinformatics. 2008 Jul 1;24(13):1532-3. doi: 10.1093/bioinformatics/btn225. Epub 2008 May 12.

Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages.

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S6. doi: 10.1186/1471-2105-8-S4-S6.

Feature subset selection for splice site prediction.

Bioinformatics. 2002;18 Suppl 2:S75-83. doi: 10.1093/bioinformatics/18.suppl_2.s75.

引用本文的文献

Assessment of branch point prediction tools to predict physiological branch points and their alteration by variants.

BMC Genomics. 2020 Jan 28;21(1):86. doi: 10.1186/s12864-020-6484-5.

The effect of mislabeled phenotypic status on the identification of mutation-carriers from SNP genotypes in dairy cattle.

BMC Res Notes. 2017 Jun 26;10(1):230. doi: 10.1186/s13104-017-2540-x.

An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

BMC Syst Biol. 2015;9 Suppl 5(Suppl 5):S1. doi: 10.1186/1752-0509-9-S5-S1. Epub 2015 Sep 1.

Computational predictions provide insights into the biology of TAL effector target sites.

PLoS Comput Biol. 2013;9(3):e1002962. doi: 10.1371/journal.pcbi.1002962. Epub 2013 Mar 14.

Genome-wide association between branch point properties and alternative splicing.

PLoS Comput Biol. 2010 Nov 24;6(11):e1001016. doi: 10.1371/journal.pcbi.1001016.

Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis.

BMC Bioinformatics. 2010 Mar 22;11:149. doi: 10.1186/1471-2105-11-149.

MotifAdjuster: a tool for computational reassessment of transcription factor binding site annotations.

Genome Biol. 2009;10(5):R46. doi: 10.1186/gb-2009-10-5-r46. Epub 2009 May 1.

Fast splice site detection using information content and feature reduction.

BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S8. doi: 10.1186/1471-2105-9-S12-S8.

Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions.

BMC Bioinformatics. 2007 Dec 19;8:481. doi: 10.1186/1471-2105-8-481.

Splice site identification using probabilistic parameters and SVM classification.

BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S15. doi: 10.1186/1471-2105-7-S5-S15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过idlBNs进行剪接位点识别。

Splice site identification by idlBNs.

作者信息

Castelo Robert, Guigó Roderic

机构信息

Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Centre de Regulació Genòmica, Psg. Marítim 37-49, Barcelona, Spain.

出版信息

Bioinformatics. 2004 Aug 4;20 Suppl 1:i69-76. doi: 10.1093/bioinformatics/bth932.

DOI:10.1093/bioinformatics/bth932

PMID:15262783

Abstract

MOTIVATION

RESULTS

摘要

通过idlBNs进行剪接位点识别。

Splice site identification by idlBNs.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过idlBNs进行剪接位点识别。

Splice site identification by idlBNs.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献