Suppr超能文献

利用EST序列对大豆重复基因组中的基因家族进行鉴定与分析。

Identification and analysis of gene families from the duplicated genome of soybean using EST sequences.

作者信息

Nelson Rex T, Shoemaker Randy

机构信息

USDA-ARS CICGR, Iowa State University, Ames, IA, 50011, USA.

出版信息

BMC Genomics. 2006 Aug 9;7:204. doi: 10.1186/1471-2164-7-204.

Abstract

BACKGROUND

Large scale gene analysis of most organisms is hampered by incomplete genomic sequences. In many organisms, such as soybean, the best source of sequence information is the existence of expressed sequence tag (EST) libraries. Soybean has a large (1115 Mbp) genome that has yet to be fully sequenced. However it does have the 6th largest EST collection comprised of ESTs from a variety of soybean genotypes. Many EST libraries were constructed from RNA extracted from various genetic backgrounds, thus gene identification from these sources is complicated by the existence of both gene and allele sequence differences. We used the ESTminer suite of programs to identify potential soybean gene transcripts from a single genetic background allowing us to observe functional classifications between gene families as well as structural differences between genes and gene paralogs within families. The identification of potential gene sequences (pHaps) from soybean allows us to begin to get a picture of the genomic history of the organism as well as begin to observe the evolutionary fates of gene copies in this highly duplicated genome.

RESULTS

We identified approximately 45,000 potential gene sequences (pHaps) from EST sequences of Williams/Williams82, an inbred genotype of soybean (Glycine max L. Merr.) using a redundancy criterion to identify reproducible sequence differences between related genes within gene families. Analysis of these sequences revealed single base substitutions and single base indels are the most frequently observed form of sequence variation between genes within families in the dataset. Genomic sequencing of selected loci indicate that intron-like intervening sequences are numerous and are approximately 220 bp in length. Functional annotation of gene sequences indicate functional classifications are not randomly distributed among gene families containing few or many genes.

CONCLUSION

The predominance of single nucleotide insertion/deletions and substitution events between genes within families (individual genes and gene paralogs) is consistent with a model of gene amplification followed by single base random mutational events expected under the classical model of duplicated gene evolution. Molecular functions of small and large gene families appear to be non-randomly distributed possibly indicating a difference in retention of duplicates or local expansion.

摘要

背景

大多数生物体的大规模基因分析受到基因组序列不完整的阻碍。在许多生物体中,如大豆,序列信息的最佳来源是表达序列标签(EST)文库的存在。大豆有一个庞大的(1115兆碱基对)基因组,尚未完全测序。然而,它拥有第六大EST文库,其中包含来自各种大豆基因型的EST。许多EST文库是从不同遗传背景提取的RNA构建的,因此从这些来源进行基因鉴定因基因和等位基因序列差异的存在而变得复杂。我们使用ESTminer程序套件从单一遗传背景中鉴定潜在的大豆基因转录本,使我们能够观察基因家族之间的功能分类以及家族内基因和基因旁系同源物之间的结构差异。从大豆中鉴定潜在的基因序列(pHaps)使我们能够开始了解该生物体的基因组历史,并开始观察这个高度重复基因组中基因拷贝的进化命运。

结果

我们使用冗余标准从大豆(Glycine max L. Merr.)的自交基因型Williams/Williams82的EST序列中鉴定出约45000个潜在基因序列(pHaps),以识别基因家族内相关基因之间可重复的序列差异。对这些序列的分析表明,单碱基替换和单碱基插入缺失是数据集中家族内基因之间最常观察到的序列变异形式。对选定基因座的基因组测序表明,内含子样间隔序列很多,长度约为220碱基对。基因序列的功能注释表明,功能分类在包含少数或多个基因的基因家族中并非随机分布。

结论

家族内基因(单个基因和基因旁系同源物)之间单核苷酸插入/缺失和替换事件的优势与基因扩增模型一致,随后是经典重复基因进化模型下预期的单碱基随机突变事件。小基因家族和大基因家族的分子功能似乎是非随机分布的,这可能表明在重复保留或局部扩增方面存在差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f70e/1557498/c407d6319f0d/1471-2164-7-204-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验