对小鼠和人类基因组进行比较，随后进行实验验证，结果发现约有1019个额外基因。

Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

作者信息

Guigo Roderic, Dermitzakis Emmanouil T, Agarwal Pankaj, Ponting Chris P, Parra Genis, Reymond Alexandre, Abril Josep F, Keibler Evan, Lyle Robert, Ucla Catherine, Antonarakis Stylianos E, Brent Michael R

机构信息

Research Group in Biomedical Informatics, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra/Centre de Regulació Genòmica, E08003 Barcelona, Catalonia, Spain.

出版信息

Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5. doi: 10.1073/pnas.0337561100. Epub 2003 Jan 27.

DOI:10.1073/pnas.0337561100

PMID:12552088

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC298740/

Abstract

A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes.

摘要

对小鼠基因组进行测序的一个主要动机是通过利用小鼠和人类之间的序列保守性来识别编码外显子，从而加速哺乳动物基因的发现。由于小鼠和人类基因组中很大一部分序列明显保守但显然不编码蛋白质，实现这一目标被证明具有挑战性。我们开发了一种两阶段程序，利用小鼠和人类基因组序列来产生一组基因，其实验验证率比以前报道的预测方法高得多。将逆转录-聚合酶链反应（RT-PCR）扩增和直接测序应用于最初的小鼠预测样本，这些样本与先前已知的基因不重叠，从而验证了139个预测中一个内含子两侧的区域，验证率达到76%。平均而言，得到确认的预测显示出比已知人类基因的小鼠直系同源基因更受限的表达模式，并且三分之二在鱼类基因组中缺乏同源物，这证明了这种双基因组方法对难以找到的基因的敏感性。我们验证了112个先前未知的已知蛋白质同源物，包括两个与发育生物学相关的同源异型盒蛋白、一个水通道蛋白和一个肌营养不良蛋白同源物。我们估计，通过这种方法鉴定的、与已知基因不重叠的1000多个基因预测的转录和剪接可以得到验证。这可能占先前未知的多外显子哺乳动物基因的很大一部分。

相似文献

Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5. doi: 10.1073/pnas.0337561100. Epub 2003 Jan 27.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Gene organization and sequence of the region containing the ribosomal protein genes RPL13A and RPS11 in the human genome and conserved features in the mouse genome.

Gene. 1999 Nov 29;240(2):371-7. doi: 10.1016/s0378-1119(99)00429-1.

Structure and expression of the mouse growth hormone receptor/growth hormone binding protein gene.

J Mol Endocrinol. 1999 Aug;23(1):33-44. doi: 10.1677/jme.0.0230033.

Accurate identification of novel human genes through simultaneous gene prediction in human, mouse, and rat.

Genome Res. 2004 Apr;14(4):661-4. doi: 10.1101/gr.1939804.

Structure of the murine fifth complement component (C5) gene. A large, highly interrupted gene with a variant donor splice site and organizational homology with the third and fourth complement component genes.

J Biol Chem. 1991 Jun 25;266(18):11818-25.

TDPOZ, a family of bipartite animal and plant proteins that contain the TRAF (TD) and POZ/BTB domains.

Gene. 2004 Jan 7;324:117-27. doi: 10.1016/j.gene.2003.09.022.

The genomic structure of two protein kinase CK2alpha genes of Xenopus laevis and features of the putative promoter region.

Mol Cell Biochem. 2001 Nov;227(1-2):175-83.

Genomic organization, 5'flanking region and tissue-specific expression of mouse phosphofructokinase C gene.

Gene. 2000 Dec 30;260(1-2):103-12. doi: 10.1016/s0378-1119(00)00463-7.

Comparative genomic sequence analysis and isolation of human and mouse alternative EGFR transcripts encoding truncated receptor isoforms.

Genomics. 2001 Jan 1;71(1):1-20. doi: 10.1006/geno.2000.6341.

引用本文的文献

uncovers hundreds of novel human (and other) exons though comparative analysis of proteins.

bioRxiv. 2024 May 6:2024.05.05.592595. doi: 10.1101/2024.05.05.592595.

Time-restricted feeding affects colonic nutrient substrates and modulates the diurnal fluctuation of microbiota in pigs.

Front Microbiol. 2023 May 19;14:1162482. doi: 10.3389/fmicb.2023.1162482. eCollection 2023.

Genome-wide Associations Reveal Human-Mouse Genetic Convergence and Modifiers of Myogenesis, CPNE1 and STC2.

Am J Hum Genet. 2019 Dec 5;105(6):1222-1236. doi: 10.1016/j.ajhg.2019.10.014. Epub 2019 Nov 21.

Brain Transcriptome Sequencing of a Natural Model of Alzheimer's Disease.

Front Aging Neurosci. 2017 Mar 20;9:64. doi: 10.3389/fnagi.2017.00064. eCollection 2017.

TWS1, a Novel Small Protein, Regulates Various Aspects of Seed and Plant Development.

Plant Physiol. 2016 Nov;172(3):1732-1745. doi: 10.1104/pp.16.00915. Epub 2016 Sep 9.

Small proteins: untapped area of potential biological importance.

Front Genet. 2013 Dec 16;4:286. doi: 10.3389/fgene.2013.00286.

Molecular characterization of mutant mouse strains generated from the EUCOMM/KOMP-CSD ES cell resource.

Mamm Genome. 2013 Aug;24(7-8):286-94. doi: 10.1007/s00335-013-9467-x. Epub 2013 Aug 4.

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

Genome Res. 2012 Sep;22(9):1698-710. doi: 10.1101/gr.134478.111.

Comparative genomic analysis of eutherian interferon-γ-inducible GTPases.

Funct Integr Genomics. 2012 Nov;12(4):599-607. doi: 10.1007/s10142-012-0291-2. Epub 2012 Aug 15.

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

BMC Bioinformatics. 2011 Dec 22;12:491. doi: 10.1186/1471-2105-12-491.

本文引用的文献

Comparative gene prediction in human and mouse.

Genome Res. 2003 Jan;13(1):108-17. doi: 10.1101/gr.871403.

Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map.

Genome Res. 2003 Jan;13(1):46-54. doi: 10.1101/gr.830003.

Human chromosome 21 gene expression atlas in the mouse.

Nature. 2002 Dec 5;420(6915):582-6. doi: 10.1038/nature01178.

Initial sequencing and comparative analysis of the mouse genome.

Nature. 2002 Dec 5;420(6915):520-62. doi: 10.1038/nature01262.

Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes.

Science. 2002 Aug 23;297(5585):1301-10. doi: 10.1126/science.1072104. Epub 2002 Jul 25.

A heat-sensitive TRP channel expressed in keratinocytes.

Science. 2002 Jun 14;296(5575):2046-9. doi: 10.1126/science.1073140. Epub 2002 May 16.

Applications of generalized pair hidden Markov models to alignment and gene finding problems.

J Comput Biol. 2002;9(2):389-99. doi: 10.1089/10665270252935520.

The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.

Genome Res. 2002 Jan;12(1):198-202. doi: 10.1101/gr.200901.

The Ensembl genome database project.

Nucleic Acids Res. 2002 Jan 1;30(1):38-41. doi: 10.1093/nar/30.1.38.

Human relaxin gene 3 (H3) and the equivalent mouse relaxin (M3) gene. Novel members of the relaxin peptide family.

J Biol Chem. 2002 Jan 11;277(2):1148-57. doi: 10.1074/jbc.M107882200. Epub 2001 Oct 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对小鼠和人类基因组进行比较，随后进行实验验证，结果发现约有1019个额外基因。

Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

作者信息

机构信息

Research Group in Biomedical Informatics, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra/Centre de Regulació Genòmica, E08003 Barcelona, Catalonia, Spain.

出版信息

Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5. doi: 10.1073/pnas.0337561100. Epub 2003 Jan 27.

DOI:10.1073/pnas.0337561100

PMID:12552088

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC298740/

Abstract

摘要

对小鼠和人类基因组进行比较，随后进行实验验证，结果发现约有1019个额外基因。

Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

对小鼠和人类基因组进行比较，随后进行实验验证，结果发现约有1019个额外基因。

Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

作者信息

机构信息

出版信息