Suppr超能文献

小鼠基因组中近期的片段和基因重复。

Recent segmental and gene duplications in the mouse genome.

作者信息

Cheung Joseph, Wilson Michael D, Zhang Junjun, Khaja Razi, MacDonald Jeffrey R, Heng Henry H Q, Koop Ben F, Scherer Stephen W

机构信息

Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada.

出版信息

Genome Biol. 2003;4(8):R47. doi: 10.1186/gb-2003-4-8-r47. Epub 2003 Jul 9.

Abstract

BACKGROUND

The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (>/= 5 kb) and recent (>/= 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies.

RESULTS

We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice.

CONCLUSION

Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis.

摘要

背景

高质量的小鼠基因组草图序列及其相关注释是一种宝贵的生物学资源。识别小鼠基因组中近期的重复片段,尤其是包含基因的区域,可能会突显小鼠近期进化中的重要事件。此外,检测近期的序列重复可以揭示基因组组装中潜在的问题区域。我们使用基于BLAST的计算启发式方法来识别小鼠基因组序列中大片段(≥5 kb)且近期(序列同一性≥90%)的片段重复。在此,我们展示了一个在2002年2月和2003年2月小鼠基因组测序联盟(MGSC)组装中发现的小鼠基因组近期重复区域的数据库。

结果

我们确定,2003年2月小鼠基因组序列组装的2695 Mb序列中有33.6 Mb(1.2%)参与了近期的片段重复,这一比例低于人类基因组中观察到的比例(约3.5 - 5%)。在这个数据集中,8.9 Mb(26%)的重复内容由“未定位”的染色体序列组成。此外,我们怀疑另有18.5 Mb的序列参与了因该基因组组装中序列错误分配导致的重复假象。通过搜索位于这些区域内的基因,我们鉴定出675个映射到小鼠基因组重复区域的基因。其中16个基因似乎在人类基因组中是独立重复的。从我们的数据集中,我们进一步对 Mater(小鼠胚胎发育必需的母源效应基因)的一个42 kb近期片段重复进行了特征分析。

结论

我们的结果提供了对小鼠基因组近期重复序列和基因内容的初步分析。许多这些重复位点以及被确定参与潜在序列错误分配错误的区域,将需要进一步的定位和测序以达到准确。我们建立了一个基因组浏览器数据库来展示本研究中鉴定出的重复内容。这些数据对于越来越多使用基因组草图序列进行实验设计和分析的研究人员也将具有相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcd7/193640/b28542fa9726/gb-2003-4-8-r47-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验