• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于寡核苷酸组成鉴定人类基因功能区域

Identification of human gene functional regions based on oligonucleotide composition.

作者信息

Solovyev V V, Lawrence C B

机构信息

Department of Cell Biology, Baylor College of Medicine, Houston, TX 77030, USA.

出版信息

Proc Int Conf Intell Syst Mol Biol. 1993;1:371-9.

PMID:7584359
Abstract

Accurate recognition of coding and intron regions within large regions of uncharacterized genomic DNA is an unsolved problem. A data base of more than 4,240,791 bp coding and 7,790,682 bp noncoding human sequences was extracted from GenBank to develop a function for locating coding regions in anonymous sequences. Several coding measures based on oligonucleotide preferences were tested on a control set that including 1/3 of all extracted sequences. An accuracy of separation of coding/noncoding regions is 87% for 9 bp oligonucleotides on 54 bp windows and 91% on 108 bp windows, respectively. For separation of coding/intron regions the accuracy is 89-90% for 8 bp oligonucleotides on 54 bp windows and up to 95% on 108 bp windows. Using the information about preferences of octanucleotides in protein coding and intron regions and significant triplet frequencies as a function of position near splice junctions, a joint splice site prediction scheme was developed. The accuracy of the joint scheme for predicting splice site positions on the test set was about 96-97%, which exceeds the accuracy of the previously reported splice site selection method based on a more complex artificial neural network approach. A model of splicing using poly-G(C) rich exon flanking sequences is suggested. A remarkable difference of oligonucleotide composition 5'- and 3'- gene regions is displayed and applied in a gene structure predictive system.

摘要

在大片未表征的基因组DNA中准确识别编码区和内含子区域是一个尚未解决的问题。从GenBank中提取了一个包含超过4240791 bp编码序列和7790682 bp非编码人类序列的数据库,以开发一种在匿名序列中定位编码区的功能。基于寡核苷酸偏好的几种编码度量方法在一个包含所有提取序列三分之一的对照集上进行了测试。对于54 bp窗口上的9 bp寡核苷酸,编码区/非编码区的分离准确率分别为87%,对于108 bp窗口则为91%。对于编码区/内含子区的分离,54 bp窗口上8 bp寡核苷酸的准确率为89 - 90%,108 bp窗口上高达95%。利用蛋白质编码区和内含子区八核苷酸偏好信息以及作为剪接连接点附近位置函数的显著三联体频率,开发了一种联合剪接位点预测方案。该联合方案在测试集上预测剪接位点位置的准确率约为96 - 97%,超过了先前报道的基于更复杂人工神经网络方法的剪接位点选择方法的准确率。提出了一种使用富含多聚G(C)的外显子侧翼序列的剪接模型。展示了5' - 和3' - 基因区域寡核苷酸组成的显著差异,并将其应用于基因结构预测系统。

相似文献

1
Identification of human gene functional regions based on oligonucleotide composition.基于寡核苷酸组成鉴定人类基因功能区域
Proc Int Conf Intell Syst Mol Biol. 1993;1:371-9.
2
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.通过寡核苷酸组成和可剪接开放阅读框的判别分析预测内部外显子。
Nucleic Acids Res. 1994 Dec 11;22(24):5156-63. doi: 10.1093/nar/22.24.5156.
3
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.通过寡核苷酸组成和可剪接开放阅读框的判别分析预测人类外显子
Proc Int Conf Intell Syst Mol Biol. 1994;2:354-62.
4
Characterization of three splice variants and genomic organization of the mouse BMAL1 gene.小鼠BMAL1基因的三种剪接变体的特征及基因组结构
Biochem Biophys Res Commun. 1999 Jul 14;260(3):760-7. doi: 10.1006/bbrc.1999.0970.
5
Recognizing exons in genomic sequence using GRAIL II.使用GRAIL II在基因组序列中识别外显子。
Genet Eng (N Y). 1994;16:241-53.
6
[Statistical analysis of DNA sequences nearby splicing sites].[剪接位点附近DNA序列的统计分析]
Mol Biol (Mosk). 2008 Jan-Feb;42(1):150-62.
7
Exonization of transposed elements: A challenge and opportunity for evolution.转座子外显子化:进化的挑战与机遇。
Biochimie. 2011 Nov;93(11):1928-34. doi: 10.1016/j.biochi.2011.07.014. Epub 2011 Jul 26.
8
Predictive identification of exonic splicing enhancers in human genes.人类基因中外显子剪接增强子的预测性识别。
Science. 2002 Aug 9;297(5583):1007-13. doi: 10.1126/science.1073774. Epub 2002 Jul 11.
9
The Gene-Finder computer tools for analysis of human and model organisms genome sequences.用于分析人类和模式生物基因组序列的基因查找计算机工具。
Proc Int Conf Intell Syst Mol Biol. 1997;5:294-302.
10
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.

引用本文的文献

1
Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences.肽词汇分析揭示了蛋白质序列中的超保守性和同音性。
Bioinform Biol Insights. 2009 Nov 24;1:101-26. doi: 10.4137/bbi.s415.
2
Ab initio gene finding in Drosophila genomic DNA.在果蝇基因组DNA中进行从头基因预测。
Genome Res. 2000 Apr;10(4):516-22. doi: 10.1101/gr.10.4.516.