• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用线性判别函数和动态规划识别人类基因结构。

Identification of human gene structure using linear discriminant functions and dynamic programming.

作者信息

Solovyev V V, Salamov A A, Lawrence C B

机构信息

Department of Cell Biology, Baylor College of Medicine, Houston, TX 77030, USA.

出版信息

Proc Int Conf Intell Syst Mol Biol. 1995;3:367-75.

PMID:7584460
Abstract

Development of advanced technique to identify gene structure is one of the main challenges of the Human Genome Project. Discriminant analysis was applied to the construction of recognition functions for various components of gene structure. Linear discriminant functions for splice sites, 5'-coding, internal exon, and 3'-coding region recognition have been developed. A gene structure prediction system FGENE has been developed based on the exon recognition functions. We compute a graph of mutual compatibility of different exons and present a gene structure models as paths of this directed acyclic graph. For an optimal model selection we apply a variant of dynamic programming algorithm to search for the path in the graph with the maximal value of the corresponding discriminant functions. Prediction by FGENE for 185 complete human gene sequences has 81% exact exon recognition accuracy and 91% accuracy at the level of individual exon nucleotides with the correlation coefficient (C) equals 0.90. Testing FGENE on 35 genes not used in the development of discriminant functions shows 71% accuracy of exact exon prediction and 89% at the nucleotide level (C = 0.86). FGENE compares very favorably with the other programs currently used to predict protein-coding regions. Analysis of uncharacterized human sequences based on our methods for splice site (HSPL, RNASPL), internal exons (HEXON), all type of exons (FEXH) and human (FGENEH) and bacterial (CDSB) gene structure prediction and recognition of human and bacterial sequences (HBR) (to test a library for E. coli contamination) is available through the University of Houston, Weizmann Institute of Science network server and a WWW page of the Human Genome Center at Baylor College of Medicine.

摘要

开发用于识别基因结构的先进技术是人类基因组计划的主要挑战之一。判别分析被应用于构建基因结构各个组成部分的识别函数。已经开发出用于剪接位点、5'编码区、内部外显子和3'编码区识别的线性判别函数。基于外显子识别函数开发了一个基因结构预测系统FGENE。我们计算不同外显子的相互兼容性图,并将基因结构模型表示为这个有向无环图的路径。为了选择最优模型,我们应用动态规划算法的一个变体来搜索图中具有相应判别函数最大值的路径。FGENE对185条完整人类基因序列的预测在精确外显子识别准确率方面达到81%,在单个外显子核苷酸水平的准确率为91%,相关系数(C)等于0.90。在35个未用于判别函数开发的基因上测试FGENE,精确外显子预测准确率为71%,在核苷酸水平为89%(C = 0.86)。FGENE与目前用于预测蛋白质编码区的其他程序相比具有很大优势。基于我们用于剪接位点(HSPL、RNASPL)、内部外显子(HEXON)、所有类型外显子(FEXH)以及人类(FGENEH)和细菌(CDSB)基因结构预测和识别的方法,以及人类和细菌序列识别(HBR)(用于测试大肠杆菌污染文库),可以通过休斯顿大学、魏茨曼科学研究所网络服务器以及贝勒医学院人类基因组中心的万维网页面进行对未表征人类序列的分析。

相似文献

1
Identification of human gene structure using linear discriminant functions and dynamic programming.使用线性判别函数和动态规划识别人类基因结构。
Proc Int Conf Intell Syst Mol Biol. 1995;3:367-75.
2
The Gene-Finder computer tools for analysis of human and model organisms genome sequences.用于分析人类和模式生物基因组序列的基因查找计算机工具。
Proc Int Conf Intell Syst Mol Biol. 1997;5:294-302.
3
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.通过寡核苷酸组成和可剪接开放阅读框的判别分析预测人类外显子
Proc Int Conf Intell Syst Mol Biol. 1994;2:354-62.
4
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.通过寡核苷酸组成和可剪接开放阅读框的判别分析预测内部外显子。
Nucleic Acids Res. 1994 Dec 11;22(24):5156-63. doi: 10.1093/nar/22.24.5156.
5
Pombe: a gene-finding and exon-intron structure prediction system for fission yeast.粟酒裂殖酵母:一种用于裂殖酵母的基因发现和外显子-内含子结构预测系统。
Yeast. 1998 Jun 15;14(8):701-10. doi: 10.1002/(SICI)1097-0061(19980615)14:8<701::AID-YEA247>3.0.CO;2-#.
6
The prediction of exons through an analysis of spliceable open reading frames.通过对可剪接开放阅读框的分析来预测外显子。
Nucleic Acids Res. 1992 Jul 11;20(13):3453-62. doi: 10.1093/nar/20.13.3453.
7
An improved system for exon recognition and gene modeling in human DNA sequences.一种用于人类DNA序列中外显子识别和基因建模的改进系统。
Proc Int Conf Intell Syst Mol Biol. 1994;2:376-84.
8
Finding genes in DNA with a Hidden Markov Model.使用隐马尔可夫模型在DNA中寻找基因。
J Comput Biol. 1997 Summer;4(2):127-41. doi: 10.1089/cmb.1997.4.127.
9
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
10
Recognizing exons in genomic sequence using GRAIL II.使用GRAIL II在基因组序列中识别外显子。
Genet Eng (N Y). 1994;16:241-53.

引用本文的文献

1
Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques.评估高通量从头基因预测软件,以发现实验室技术遗漏的真核病原体基因组编码的蛋白质。
PLoS One. 2012;7(11):e50609. doi: 10.1371/journal.pone.0050609. Epub 2012 Nov 30.
2
Genome size evolution in pufferfish: an insight from BAC clone-based Diodon holocanthus genome sequencing.河豚基因组大小的演化:基于 BAC 克隆的 Diodon holocanthus 基因组测序的新见解。
BMC Genomics. 2010 Jun 23;11:396. doi: 10.1186/1471-2164-11-396.
3
The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes.
DAWGPAWS 管道用于注释植物基因组中的基因和转座元件。
Plant Methods. 2009 Jun 19;5:8. doi: 10.1186/1746-4811-5-8.
4
Molecular and functional characterization of a Taenia adhesion gene family (TAF) encoding potential protective antigens of Taenia saginata oncospheres.一种编码牛带绦虫六钩蚴潜在保护性抗原的牛带绦虫粘附基因家族(TAF)的分子与功能特征分析
Parasitol Res. 2007 Feb;100(3):519-28. doi: 10.1007/s00436-006-0297-6. Epub 2006 Oct 18.
5
Method of predicting splice sites based on signal interactions.基于信号相互作用预测剪接位点的方法。
Biol Direct. 2006 Apr 3;1:10. doi: 10.1186/1745-6150-1-10.
6
Random sheared fosmid library as a new genomic tool to accelerate complete finishing of rice (Oryza sativa spp. Nipponbare) genome sequence: sequencing of gap-specific fosmid clones uncovers new euchromatic portions of the genome.随机剪切的黏粒文库作为一种新的基因组工具用于加速水稻(日本晴亚种)基因组序列的完全完成:间隙特异性黏粒克隆的测序揭示了基因组新的常染色质部分。
Theor Appl Genet. 2005 Nov;111(8):1596-607. doi: 10.1007/s00122-005-0091-3. Epub 2005 Nov 10.
7
Spotted leaf11, a negative regulator of plant cell death and defense, encodes a U-box/armadillo repeat protein endowed with E3 ubiquitin ligase activity.斑点叶11是植物细胞死亡和防御的负调控因子,编码一种具有E3泛素连接酶活性的U-box/犰狳重复蛋白。
Plant Cell. 2004 Oct;16(10):2795-808. doi: 10.1105/tpc.104.025171. Epub 2004 Sep 17.
8
The Ensembl automatic gene annotation system.Ensembl自动基因注释系统。
Genome Res. 2004 May;14(5):942-50. doi: 10.1101/gr.1858004.
9
A complexity reduction algorithm for analysis and annotation of large genomic sequences.一种用于大型基因组序列分析和注释的复杂度降低算法。
Genome Res. 2003 Feb;13(2):313-22. doi: 10.1101/gr.313703.
10
Reevaluating human gene annotation: a second-generation analysis of chromosome 22.重新评估人类基因注释:22号染色体的第二代分析
Genome Res. 2003 Jan;13(1):27-36. doi: 10.1101/gr.695703.