比较基因组学方法：基因组对应、基因识别与调控基序发现。

Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery.

作者信息

Kellis Manolis, Patterson Nick, Birren Bruce, Berger Bonnie, Lander Eric S

机构信息

Whitehead Institute Center for Genome Research, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.

出版信息

J Comput Biol. 2004;11(2-3):319-55. doi: 10.1089/1066527041410319.

DOI:10.1089/1066527041410319

PMID:15285895

Abstract

In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. (1) We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90% of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. (2) We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10% of previously annotated genes) and refining the gene structure of hundreds of genes. (3) We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based on genomewide conservation patterns of known motifs, we developed three conservation criteria that we used to discover novel motifs. We used an enumeration approach to select strongly conserved motif cores, which we extended and collapsed into a small number of candidate regulatory motifs. These include most previously known regulatory motifs as well as several noteworthy novel motifs. The majority of discovered motifs are enriched in functionally related genes, allowing us to infer a candidate function for novel motifs. Our results demonstrate the power of comparative genomics to further our understanding of any species. Our methods are validated by the extensive experimental knowledge in yeast and will be invaluable in the study of complex genomes like that of the human.

摘要

在凯利斯等人（2003年）的研究中，我们公布了奇异酵母、米卡塔酵母和贝酵母的基因组序列，并将这三种酵母物种与其近亲酿酒酵母进行了比较。全基因组比较分析有助于识别功能上重要的序列，包括编码序列和非编码序列。在这篇配套论文中，我们描述了支撑这些基因组分析的数学和算法结果。（1）我们提出了自动确定基因组对应关系的方法。尽管酵母基因组中存在大量重复基因，但这些算法能够自动识别出四种物种中超过90%的基因和基因间区域的直系同源物。基因对应关系中剩余的模糊之处揭示了基因组快速变化区域中近期的基因家族扩张。（2）我们提出了基于相关物种间核苷酸保守模式来识别蛋白质编码基因的方法。我们观察到了保持功能蛋白阅读框的压力，并开发了一种具有高灵敏度和特异性的基因识别测试方法。我们使用该测试方法重新审视酿酒酵母的基因组，使总体基因数量减少了500个基因（占先前注释基因的10%），并完善了数百个基因的基因结构。（3）我们提出了系统地从头识别调控基序的新方法。这些方法不依赖于基因功能的先验知识，因此与当前关于计算基序发现的文献有所不同。基于已知基序的全基因组保守模式，我们制定了三个保守标准，用于发现新的基序。我们采用枚举方法选择高度保守的基序核心，将其扩展并合并为少数候选调控基序。这些基序包括大多数先前已知的调控基序以及几个值得注意的新基序。大多数发现的基序在功能相关基因中富集，这使我们能够推断出新基序的候选功能。我们的结果证明了比较基因组学在深化我们对任何物种理解方面的强大作用。我们的方法通过酵母中广泛的实验知识得到了验证，在研究像人类这样的复杂基因组时将具有重要价值。

相似文献

Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery.

J Comput Biol. 2004;11(2-3):319-55. doi: 10.1089/1066527041410319.

Sequencing and comparison of yeast species to identify genes and regulatory elements.

Nature. 2003 May 15;423(6937):241-54. doi: 10.1038/nature01644.

Eukaryotic regulatory element conservation analysis and identification using comparative genomics.

Genome Res. 2004 Mar;14(3):451-8. doi: 10.1101/gr.1327604.

Computational discovery of transcriptional regulatory modules in fungal ribosome biogenesis genes reveals novel sequence and function patterns.

PLoS One. 2013;8(3):e59851. doi: 10.1371/journal.pone.0059851. Epub 2013 Mar 29.

Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes.

BMC Genomics. 2010 Mar 26;11:208. doi: 10.1186/1471-2164-11-208.

A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs.

BMC Bioinformatics. 2012 Nov 27;13:317. doi: 10.1186/1471-2105-13-317.

Contribution of transcription factor binding site motif variants to condition-specific gene expression patterns in budding yeast.

PLoS One. 2012;7(2):e32274. doi: 10.1371/journal.pone.0032274. Epub 2012 Feb 23.

Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

BMC Genomics. 2017 Oct 13;18(1):785. doi: 10.1186/s12864-017-4171-y.

Evolutionary conservation and functional implications of circular code motifs in eukaryotic genomes.

Biosystems. 2019 Jan;175:57-74. doi: 10.1016/j.biosystems.2018.10.014. Epub 2018 Oct 24.

Comparative genomics reveals long, evolutionarily conserved, low-complexity islands in yeast proteins.

J Mol Evol. 2006 Sep;63(3):415-25. doi: 10.1007/s00239-005-0291-0. Epub 2006 Aug 21.

引用本文的文献

Reconstruction of the genome-scale metabolic network model of CCBAU45436 for free-living and symbiotic states.

Front Bioeng Biotechnol. 2024 Mar 25;12:1377334. doi: 10.3389/fbioe.2024.1377334. eCollection 2024.

Lessons from the meiotic recombination landscape of the ZMM deficient budding yeast Lachancea waltii.

PLoS Genet. 2023 Jan 6;19(1):e1010592. doi: 10.1371/journal.pgen.1010592. eCollection 2023 Jan.

Integrated Profiling of Gram-Positive and Gram-Negative Probiotic Genomes, Proteomes and Metabolomes Revealed Small Molecules with Differential Growth Inhibition of Antimicrobial-Resistant Pathogens.

J Diet Suppl. 2023;20(5):788-810. doi: 10.1080/19390211.2022.2120146. Epub 2022 Sep 13.

Genome and Transcriptome Analyses Provide Insight Into the Omega-3 Long-Chain Polyunsaturated Fatty Acids Biosynthesis of SR21.

Front Microbiol. 2020 Apr 16;11:687. doi: 10.3389/fmicb.2020.00687. eCollection 2020.

Metabolic Analyses of Nitrogen Fixation in the Soybean Microsymbiont Sinorhizobium fredii Using Constraint-Based Modeling.

mSystems. 2020 Feb 18;5(1):e00516-19. doi: 10.1128/mSystems.00516-19.

Use of genome-scale models to get new insights into the marine actinomycete genus Salinispora.

BMC Syst Biol. 2019 Jan 21;13(1):11. doi: 10.1186/s12918-019-0683-1.

A Statistical Model for Event Sequence Data.

JMLR Workshop Conf Proc. 2014;33:338-346.

NFAT5-mediated CACNA1C expression is critical for cardiac electrophysiological development and maturation.

J Mol Med (Berl). 2016 Sep;94(9):993-1002. doi: 10.1007/s00109-016-1444-x. Epub 2016 Jul 1.

Inferring Orthologs: Open Questions and Perspectives.

Genomics Insights. 2016 Feb 25;9:17-28. doi: 10.4137/GEI.S37925. eCollection 2016.

Yeast Interspecies Comparative Proteomics Reveals Divergence in Expression Profiles and Provides Insights into Proteome Resource Allocation and Evolutionary Roles of Gene Duplication.

Mol Cell Proteomics. 2016 Jan;15(1):218-35. doi: 10.1074/mcp.M115.051854. Epub 2015 Nov 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

比较基因组学方法：基因组对应、基因识别与调控基序发现。

Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献