Suppr超能文献

大规模基因组序列中的自动基因识别

Automated gene identification in large-scale genomic sequences.

作者信息

Xu Y, Uberbacher E C

机构信息

Computer Science and Mathematics Division, Oak Ridge National Laboratory, Tennessee 37831-6364, USA.

出版信息

J Comput Biol. 1997 Fall;4(3):325-38. doi: 10.1089/cmb.1997.4.325.

Abstract

Computational methods for gene identification in genomic sequences typically have two phases: coding region recognition and gene parsing. While there are a number of effective methods for recognizing coding regions (exons), parsing the recognized exons into proper gene structures, to a large extent, remains an unsolved problem. We have developed a computer program which can automatically parse the recognized exons into gene models that are most consistent with the available Expressed Sequence Tags (ESTs) and a set of biological heuristics, derived empirically. The gene modeling algorithm used in this program provides a general framework for applying EST information so the modeling accuracy improves as the amount of available EST information increases. Based on preliminary tests on a number of large DNA sequences, using the dbEST database, we have observed that the algorithm can (1) accurately model complicated multiple gene structures, including embedded genes, (2) identify falsely-recognized exons and locate missed exons by the initial exon recognition phase, and (3) make more accurate exon boundary predictions, if the necessary EST information is available. We have extended this EST-based gene modeling algorithm to model genes on unfinished DNA contigs at the end of the shotgun sequencing. This extended version can automatically determine the orientations and the relative order of the DNA contigs (with gaps between them) using the available ESTs as reference models, before the gene modeling phase.

摘要

基因组序列中基因识别的计算方法通常有两个阶段

编码区识别和基因解析。虽然有许多有效的方法来识别编码区(外显子),但将识别出的外显子解析成合适的基因结构在很大程度上仍然是一个未解决的问题。我们开发了一个计算机程序,它可以自动将识别出的外显子解析成与可用的表达序列标签(EST)以及一组根据经验得出的生物学启发式规则最一致的基因模型。该程序中使用的基因建模算法提供了一个应用EST信息的通用框架,因此随着可用EST信息量的增加,建模准确性也会提高。基于对多个大型DNA序列使用dbEST数据库进行的初步测试,我们观察到该算法能够:(1)准确地对复杂的多基因结构进行建模,包括嵌入式基因;(2)识别错误识别的外显子,并定位初始外显子识别阶段遗漏的外显子;(3)如果有必要的EST信息,能做出更准确的外显子边界预测。我们已经将这种基于EST的基因建模算法扩展到对鸟枪法测序末尾未完成的DNA重叠群上的基因进行建模。这个扩展版本可以在基因建模阶段之前,以可用的EST作为参考模型,自动确定DNA重叠群(它们之间有间隙)的方向和相对顺序。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验