Suppr超能文献

通过多传感器神经网络方法在人类DNA序列中定位蛋白质编码区域。

Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.

作者信息

Uberbacher E C, Mural R J

机构信息

Biology Division, Oak Ridge National Laboratory, TN.

出版信息

Proc Natl Acad Sci U S A. 1991 Dec 15;88(24):11261-5. doi: 10.1073/pnas.88.24.11261.

Abstract

Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the "coding recognition module" identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which we are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.

摘要

高等真核生物中的基因可能跨越数十或数百千碱基对,而蛋白质编码区域仅占总序列的百分之几。在大片未表征的DNA区域中识别基因是一项艰巨的任务,也是目前许多研究工作的重点。我们描述了一种可靠的计算方法,用于在匿名DNA序列中定位基因的蛋白质编码部分。利用机器人环境感知提出的概念,我们的方法结合了一组传感器算法和一个神经网络来定位编码区域。还描述了几种报告DNA序列局部特征、因此起到传感器作用的算法。在其当前配置中,“编码识别模块”识别出90%长度为100个碱基或更长的编码外显子,每指出五个编码外显子中不到一个错误阳性编码外显子。这一错误阳性率明显低于我们所知的任何方法。该模块展示了一种对序列模式识别问题具有普遍适用性的方法,可供当前的研究工作使用。

相似文献

5
Prediction of function in DNA sequence analysis.DNA序列分析中的功能预测
J Comput Biol. 1995 Spring;2(1):87-115. doi: 10.1089/cmb.1995.2.87.

引用本文的文献

1
gene prediction for protein-coding regions.蛋白质编码区域的基因预测。
Bioinform Adv. 2023 Aug 10;3(1):vbad105. doi: 10.1093/bioadv/vbad105. eCollection 2023.
9
Classifying coding DNA with nucleotide statistics.利用核苷酸统计对编码DNA进行分类。
Bioinform Biol Insights. 2009 Oct 28;3:141-54. doi: 10.4137/bbi.s3030.

本文引用的文献

2
Fractal geometry of music.音乐的分形几何
Proc Natl Acad Sci U S A. 1990 Feb 1;87(3):938-41. doi: 10.1073/pnas.87.3.938.
3
Recognition of protein coding regions in DNA sequences.DNA序列中蛋白质编码区域的识别。
Nucleic Acids Res. 1982 Sep 11;10(17):5303-18. doi: 10.1093/nar/10.17.5303.
6
A comprehensive set of sequence analysis programs for the VAX.一套适用于VAX的综合序列分析程序。
Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387-95. doi: 10.1093/nar/12.1part1.387.
9
The GenBank genetic sequence data bank.基因银行基因序列数据库。
Nucleic Acids Res. 1988 Mar 11;16(5):1861-3. doi: 10.1093/nar/16.5.1861.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验