通过比较证据整合和进化分析进行人鼠基因鉴定。

Human-mouse gene identification by comparative evidence integration and evolutionary analysis.

作者信息

Zhang Lingang, Pavlovic Vladimir, Cantor Charles R, Kasif Simon

机构信息

Center for Advanced Biotechnology, Boston University, Boston, Massachusetts 02215, USA.

出版信息

Genome Res. 2003 Jun;13(6A):1190-202. doi: 10.1101/gr.703903. Epub 2003 May 12.

DOI:10.1101/gr.703903

PMID:12743024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC403647/

Abstract

The identification of genes in the human genome remains a challenge, as the actual predictions appear to disagree tremendously and vary dramatically on the basis of the specific gene-finding methodology used. Because the pattern of conservation in coding regions is expected to be different from intronic or intergenic regions, a comparative computational analysis can lead, in principle, to an improved computational identification of genes in the human genome by using a reference, such as mouse genome. However, this comparative methodology critically depends on three important factors: (1) the selection of the most appropriate reference genome. In particular, it is not clear whether the mouse is at the correct evolutionary distance from the human to provide sufficiently distinctive conservation levels in different genomic regions, (2) the selection of comparative features that provide the most benefit to gene recognition, and (3) the selection of evidence integration architecture that effectively interprets the comparative features. We address the first question by a novel evolutionary analysis that allows us to explicitly correlate the performance of the gene recognition system with the evolutionary distance (time) between the two genomes. Our simulation results indicate that there is a wide range of reference genomes at different evolutionary time points that appear to deliver reasonable comparative prediction of human genes. In particular, the evolutionary time between human and mouse generally falls in the region of good performance; however, better accuracy might be achieved with a reference genome further than mouse. To address the second question, we propose several natural comparative measures of conservation for identifying exons and exon boundaries. Finally, we experiment with Bayesian networks for the integration of comparative and compositional evidence.

摘要

在人类基因组中识别基因仍然是一项挑战，因为实际预测结果似乎差异极大，并且会因所使用的特定基因发现方法而有显著不同。由于编码区域的保守模式预计与内含子或基因间区域不同，原则上，通过使用如小鼠基因组这样的参考基因组进行比较计算分析，能够改进对人类基因组中基因的计算识别。然而，这种比较方法严重依赖于三个重要因素：（1）选择最合适的参考基因组。特别是，尚不清楚小鼠与人类的进化距离是否合适，能否在不同基因组区域提供足够独特的保守水平；（2）选择对基因识别最有帮助的比较特征；（3）选择能有效解释比较特征的证据整合架构。我们通过一种新颖的进化分析来解决第一个问题，这种分析使我们能够明确地将基因识别系统的性能与两个基因组之间的进化距离（时间）关联起来。我们的模拟结果表明，在不同进化时间点存在广泛的参考基因组，它们似乎能对人类基因进行合理的比较预测。特别是，人与小鼠之间的进化时间通常处于性能良好的区域；然而，使用比小鼠更远的参考基因组可能会获得更高的准确性。为了解决第二个问题，我们提出了几种用于识别外显子和外显子边界的自然保守比较度量。最后，我们试验了用于整合比较证据和组成证据的贝叶斯网络。

相似文献

Human-mouse gene identification by comparative evidence integration and evolutionary analysis.

Genome Res. 2003 Jun;13(6A):1190-202. doi: 10.1101/gr.703903. Epub 2003 May 12.

Improving the specificity of exon prediction using comparative genomics.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S13. doi: 10.1186/1471-2164-9-S2-S13.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

SNPs occur in regions with less genomic sequence conservation.

PLoS One. 2011;6(6):e20660. doi: 10.1371/journal.pone.0020660. Epub 2011 Jun 6.

The mitochondrial genome of the ascalaphid owlfly Libelloides macaronius and comparative evolutionary mitochondriomics of neuropterid insects.

BMC Genomics. 2011 May 10;12:221. doi: 10.1186/1471-2164-12-221.

Gene structure conservation aids similarity based gene prediction.

Nucleic Acids Res. 2004 Feb 4;32(2):776-83. doi: 10.1093/nar/gkh211. Print 2004.

Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation.

Nucleic Acids Res. 2016 Jun 20;44(11):e103. doi: 10.1093/nar/gkw210. Epub 2016 Mar 25.

Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment.

Genome Res. 2004 May;14(5):852-9. doi: 10.1101/gr.1934904. Epub 2004 Apr 12.

Defining Functional Genic Regions in the Human Genome through Integration of Biochemical, Evolutionary, and Genetic Evidence.

Mol Biol Evol. 2017 Jul 1;34(7):1788-1798. doi: 10.1093/molbev/msx101.

Comparative gene prediction in human and mouse.

Genome Res. 2003 Jan;13(1):108-17. doi: 10.1101/gr.871403.

引用本文的文献

GeneWaltz--A new method for reducing the false positives of gene finding.

BioData Min. 2010 Sep 28;3(1):6. doi: 10.1186/1756-0381-3-6.

Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.

PLoS Comput Biol. 2008 Apr 18;4(4):e1000067. doi: 10.1371/journal.pcbi.1000067.

Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures.

Nature. 2007 Nov 8;450(7167):219-32. doi: 10.1038/nature06340.

Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure.

Bioinformatics. 2007 Jun 15;23(12):1468-75. doi: 10.1093/bioinformatics/btm133. Epub 2007 May 5.

Prediction of small, noncoding RNAs in bacteria using heterogeneous data.

J Math Biol. 2008 Jan;56(1-2):183-200. doi: 10.1007/s00285-007-0079-5. Epub 2007 Mar 13.

Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts.

Genome Biol. 2006;7 Suppl 1(Suppl 1):S8.1-9. doi: 10.1186/gb-2006-7-s1-s8. Epub 2006 Aug 7.

Gene finding in the chicken genome.

BMC Bioinformatics. 2005 May 30;6:131. doi: 10.1186/1471-2105-6-131.

Subtree power analysis and species selection for comparative genomics.

Proc Natl Acad Sci U S A. 2005 May 31;102(22):7900-5. doi: 10.1073/pnas.0502790102. Epub 2005 May 23.

An empirical analysis of training protocols for probabilistic gene finders.

BMC Bioinformatics. 2004 Dec 21;5:206. doi: 10.1186/1471-2105-5-206.

The truth about mouse, human, worms and yeast.

Hum Genomics. 2004 Jan;1(2):146-9. doi: 10.1186/1479-7364-1-2-146.

本文引用的文献

The gene identification problem: an overview for developers.

Comput Chem. 1996 Mar;20(1):103-18. doi: 10.1016/s0097-8485(96)80012-x.

Comparative gene prediction in human and mouse.

Genome Res. 2003 Jan;13(1):108-17. doi: 10.1101/gr.871403.

Initial sequencing and comparative analysis of the mouse genome.

Nature. 2002 Dec 5;420(6915):520-62. doi: 10.1038/nature01262.

Comparative ab initio prediction of gene structures using pair HMMs.

Bioinformatics. 2002 Oct;18(10):1309-18. doi: 10.1093/bioinformatics/18.10.1309.

Computational prediction of eukaryotic protein-coding genes.

Nat Rev Genet. 2002 Sep;3(9):698-709. doi: 10.1038/nrg890.

A comparative genomic method for computational identification of prokaryotic translation initiation sites.

Nucleic Acids Res. 2002 Jul 15;30(14):3181-91. doi: 10.1093/nar/gkf423.

A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome.

Science. 2002 May 31;296(5573):1661-71. doi: 10.1126/science.1069193.

Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages.

Mol Biol Evol. 2002 Jun;19(6):908-17. doi: 10.1093/oxfordjournals.molbev.a004148.

Applications of generalized pair hidden Markov models to alignment and gene finding problems.

J Comput Biol. 2002;9(2):389-99. doi: 10.1089/10665270252935520.

A Bayesian framework for combining gene predictions.

Bioinformatics. 2002 Jan;18(1):19-27. doi: 10.1093/bioinformatics/18.1.19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过比较证据整合和进化分析进行人鼠基因鉴定。

Human-mouse gene identification by comparative evidence integration and evolutionary analysis.

作者信息

Zhang Lingang, Pavlovic Vladimir, Cantor Charles R, Kasif Simon

机构信息

Center for Advanced Biotechnology, Boston University, Boston, Massachusetts 02215, USA.

出版信息

Genome Res. 2003 Jun;13(6A):1190-202. doi: 10.1101/gr.703903. Epub 2003 May 12.

DOI:10.1101/gr.703903

PMID:12743024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC403647/

Abstract

摘要

通过比较证据整合和进化分析进行人鼠基因鉴定。

Human-mouse gene identification by comparative evidence integration and evolutionary analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过比较证据整合和进化分析进行人鼠基因鉴定。

Human-mouse gene identification by comparative evidence integration and evolutionary analysis.

作者信息

机构信息

出版信息