Suppr超能文献

使用多项回归和层次聚类分析原核生物中的基因组特征。

Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering.

作者信息

Bohlin Jon, Skjerve Eystein, Ussery David W

机构信息

Norwegian School of Veterinary Science, Oslo, Norway.

出版信息

BMC Genomics. 2009 Oct 21;10:487. doi: 10.1186/1471-2164-10-487.

Abstract

BACKGROUND

Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments.Using genomic signatures, we pair-wise compared 867 different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable with AT content, phyla, growth temperature, selective pressure, habitat, sequence size, oxygen requirement and pathogenicity as predictors.

RESULTS

Many significant factors were associated with the genomic signature, most notably AT content. Phyla was also an important factor, although considerably less so than AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.

CONCLUSION

The statistics obtained using hierarchical clustering and multinomial regression analysis indicate that the genomic signature is shaped by many factors, and this may explain the varying ability to classify prokaryotic organisms below genus level.

摘要

背景

最近,细菌基因组序列的可获取性呈爆发式增长,这使得现在能够对来自各种环境的800多种不同细菌染色体的基因组特征进行分析。利用基因组特征,我们对从长度超过10万个碱基对的染色体和质粒中获取的867个不同的基因组DNA序列进行了成对比较。在拟合多项回归模型之前,对比较结果进行了层次聚类。回归模型将聚类组作为响应变量,以AT含量、门类、生长温度、选择压力、栖息地、序列大小、需氧量和致病性作为预测变量。

结果

许多显著因素与基因组特征相关,最显著的是AT含量。门类也是一个重要因素,尽管其重要性远低于AT含量。通过序列大小、栖息地、生长温度、以寡核苷酸使用方差衡量的选择压力和需氧量等因素,回归模型也得到了虽小但显著的改进。

结论

使用层次聚类和多项回归分析获得的统计数据表明,基因组特征受多种因素影响,这可能解释了在属以下水平对原核生物进行分类时能力各异的原因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0b4/2770534/8cb80dbb8e0b/1471-2164-10-487-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验