Suppr超能文献

通过剪接序列比对进行基因识别。

Gene recognition via spliced sequence alignment.

作者信息

Gelfand M S, Mironov A A, Pevzner P A

机构信息

Institute of Protein Research, Russian Academy of Sciences, Puschino, Moscow, Russia.

出版信息

Proc Natl Acad Sci U S A. 1996 Aug 20;93(17):9061-6. doi: 10.1073/pnas.93.17.9061.

Abstract

Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.

摘要

基因识别是计算分子生物学中最重要的问题之一。以往解决该问题的尝试基于统计学,而组合方法在基因识别中的应用几乎未被探索。大规模cDNA测序的最新进展为基因识别开辟了一条新途径,即利用先前测序的基因作为识别新测序基因的线索。本文描述了一种剪接比对算法和软件工具,该工具能在多项式时间内探索所有可能的外显子组合,并找到与相关蛋白质拟合度最佳的多外显子结构。与其他现有方法不同,即使在存在短外显子或密码子使用异常的外显子的情况下,该算法也能成功识别基因;我们还报告了具有10个以上外显子的基因的正确组合。在具有已知哺乳动物亲缘关系的人类基因测试样本中,预测蛋白质与实际蛋白质之间的平均相关性为99%。该算法正确重建了87%的基因,预测的外显子-内含子结构与实际结构之间罕见的差异是由短(少于5个氨基酸)的起始/末端外显子或可变剪接引起的。此外,当同源蛋白质是非脊椎动物甚至是原核生物时,该算法对人类基因的预测也相当不错。大量模拟证实了该方法令人惊讶的良好性能:特别是,当目标蛋白质有160个接受点突变(PAM)(25%的相似度)时,预测基因与实际基因之间的相关性仍高达95%。

相似文献

1
Gene recognition via spliced sequence alignment.通过剪接序列比对进行基因识别。
Proc Natl Acad Sci U S A. 1996 Aug 20;93(17):9061-6. doi: 10.1073/pnas.93.17.9061.
7
Accurate identification of alternatively spliced exons using support vector machine.使用支持向量机准确识别可变剪接外显子。
Bioinformatics. 2005 Apr 1;21(7):897-901. doi: 10.1093/bioinformatics/bti132. Epub 2004 Nov 5.
10
Gene structure prediction using information on homologous protein sequence.
Comput Appl Biosci. 1996 Jun;12(3):161-70. doi: 10.1093/bioinformatics/12.3.161.

引用本文的文献

5
Whole-Genome Alignment and Comparative Annotation.全基因组比对和注释。
Annu Rev Anim Biosci. 2019 Feb 15;7:41-64. doi: 10.1146/annurev-animal-020518-115005. Epub 2018 Oct 31.
8
Physico-chemical fingerprinting of RNA genes.RNA基因的物理化学指纹图谱
Nucleic Acids Res. 2017 Apr 20;45(7):e47. doi: 10.1093/nar/gkw1236.

本文引用的文献

1
Recognition of genes in human DNA sequences.
J Comput Biol. 1996 Summer;3(2):223-34. doi: 10.1089/cmb.1996.3.223.
5
Prediction of the exon-intron structure by a dynamic programming approach.
Biosystems. 1993;30(1-3):173-82. doi: 10.1016/0303-2647(93)90069-o.
7
Gene structure prediction by linguistic methods.
Genomics. 1994 Oct;23(3):540-51. doi: 10.1006/geno.1994.1541.
10
Prediction of function in DNA sequence analysis.DNA序列分析中的功能预测
J Comput Biol. 1995 Spring;2(1):87-115. doi: 10.1089/cmb.1995.2.87.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验