Suppr超能文献

在 KNApSAcK 摩托车数据库启发的大数据生物学背景下,对植物中与次生代谢途径相关的酶的蛋白质序列多样性进行系统分析。

Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK motorcycle database.

机构信息

Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara, 630-0192 Japan.

出版信息

Plant Cell Physiol. 2013 May;54(5):711-27. doi: 10.1093/pcp/pct041. Epub 2013 Mar 18.

Abstract

Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.

摘要

生物学正日益成为一门数据密集型科学,随着组学领域(如基因组学、转录组学、蛋白质组学和代谢组学)的最新进展,这种趋势更加明显。物种-代谢物关系数据库 KNApSAcK Core 在代谢组学研究中得到了广泛的应用和引用,对这些研究工作的时间序列分析有助于揭示代谢组学研究的最新趋势。为了满足这些趋势的需求,KNApSAcK 数据库通过纳入称为 Motorcycle DB 的二次代谢途径数据库得到了扩展。我们通过使用批量学习自组织映射(BL-SOM)的方法来检查与次生代谢相关的酶序列多样性。最初,我们使用由植物和细菌蛋白序列片段中所有可能二肽的频率组成的大数据矩阵构建了一个图谱。通过识别与图谱中某些酶组相关的片段簇,检查了次生代谢途径的酶序列多样性。讨论了 15 种次生代谢酶组的多样性程度。需要应用于大数据矩阵的 BL-SOM 等数据密集型方法来系统化蛋白质序列。处理大数据已经成为生物学不可避免的一部分。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验