Suppr超能文献

通过大规模序列-结构穿线法对基因组进行结构表征。

Structural characterization of genomes by large scale sequence-structure threading.

作者信息

Cherkasov Artem, Jones Steven J M

机构信息

Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada.

出版信息

BMC Bioinformatics. 2004 Apr 3;5:37. doi: 10.1186/1471-2105-5-37.

Abstract

BACKGROUND

Using sequence-structure threading we have conducted structural characterization of complete proteomes of 37 archaeal, bacterial and eukaryotic organisms (including worm, fly, mouse and human) totaling 167,888 genes.

RESULTS

The reported data represent first rather general evaluation of performance of full sequence-structure threading on multiple genomes providing opportunity to evaluate its general applicability for large scale studies. According to the estimated results the sequence-structure threading has assigned protein folds to more then 60% of eukaryotic, 68% of archaeal and 70% of bacterial proteomes.The repertoires of protein classes, architectures, topologies and homologous superfamilies (according to the CATH 2.4 classification) have been established for distant organisms and superkingdoms. It has been found that the average abundance of CATH classes decreases from "alpha and beta" to "mainly beta", followed by "mainly alpha" and "few secondary structures".3-Layer (aba) Sandwich has been characterized as the most abundant protein architecture and Rossman fold as the most common topology.

CONCLUSION

The analysis of genomic occurrences of CATH 2.4 protein homologous superfamilies and topologies has revealed the power-law character of their distributions. The corresponding double logarithmic "frequency - genomic occurrence" dependences characteristic of scale-free systems have been established for individual organisms and for three superkingdoms.

摘要

背景

我们利用序列-结构穿线法对37种古细菌、细菌和真核生物(包括线虫、果蝇、小鼠和人类)的完整蛋白质组进行了结构表征,这些蛋白质组共有167,888个基因。

结果

所报告的数据代表了对全序列-结构穿线法在多个基因组上性能的首次较为全面的评估,为评估其在大规模研究中的普遍适用性提供了机会。根据估计结果,序列-结构穿线法已为超过60%的真核生物、68%的古细菌和70%的细菌蛋白质组分配了蛋白质折叠。已为远缘生物和超界建立了蛋白质类别、结构、拓扑和同源超家族(根据CATH 2.4分类)的目录。研究发现,CATH类别的平均丰度从“α和β”到“主要是β”,其次是“主要是α”和“很少的二级结构”逐渐降低。三层(aba)三明治被表征为最丰富的蛋白质结构,罗斯曼折叠被表征为最常见的拓扑。

结论

对CATH 2.4蛋白质同源超家族和拓扑的基因组出现情况的分析揭示了它们分布的幂律特征。已为单个生物体和三个超界建立了无标度系统特有的相应双对数“频率-基因组出现”依赖性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6c0/419331/eec61945141c/1471-2105-5-37-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验