• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种从宏基因组分析中获得的功能未知蛋白基因的功能预测的新型生物信息学策略。

A novel bioinformatics strategy for function prediction of poorly-characterized protein genes obtained from metagenome analyses.

机构信息

Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Shiga-ken, Japan.

出版信息

DNA Res. 2009 Oct;16(5):287-97. doi: 10.1093/dnares/dsp018. Epub 2009 Oct 3.

DOI:10.1093/dnares/dsp018
PMID:19801558
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2762413/
Abstract

As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homology search has clearly been a powerful and irreplaceable method, the functions of only 50% or fewer of genes can be predicted when a novel genome is decoded. A prediction method independent of the homology search is urgently needed. By analyzing oligonucleotide compositions in genomic sequences, we previously developed a modified Self-Organizing Map 'BLSOM' that clustered genomic fragments according to phylotype with no advance knowledge of phylotype. Using BLSOM for di-, tri- and tetrapeptide compositions, we developed a system to enable separation (self-organization) of proteins by function. Analyzing oligopeptide frequencies in proteins previously classified into COGs (clusters of orthologous groups of proteins), BLSOMs could faithfully reproduce the COG classifications. This indicated that proteins, whose functions are unknown because of lack of significant sequence similarity with function-known proteins, can be related to function-known proteins based on similarity in oligopeptide composition. BLSOM was applied to predict functions of vast quantities of proteins derived from mixed genomes in environmental samples.

摘要

由于 DNA 测序技术的显著进步,大量的基因组序列已经被解码。当基因组序列被解码时,对氨基酸序列(如 BLAST)进行同源性搜索已经成为赋予基因/蛋白质功能的基本工具。尽管同源性搜索显然是一种强大且不可替代的方法,但当解码新的基因组时,只有 50%或更少的基因的功能可以被预测。因此,迫切需要一种不依赖于同源性搜索的预测方法。通过分析基因组序列中的寡核苷酸组成,我们之前开发了一种改进的自组织映射“BLSOM”,它可以根据没有先验知识的系统发育型对基因组片段进行聚类。使用 BLSOM 分析二肽、三肽和四肽组成,我们开发了一种系统,可以根据功能对蛋白质进行分离(自组织)。分析先前根据 COG(蛋白质直系同源群簇)分类的寡肽频率,BLSOM 可以准确地再现 COG 分类。这表明,由于与功能已知的蛋白质缺乏显著的序列相似性,因此功能未知的蛋白质可以根据寡肽组成的相似性与功能已知的蛋白质相关联。BLSOM 被应用于预测从环境样本中混合基因组中大量蛋白质的功能。

相似文献

1
A novel bioinformatics strategy for function prediction of poorly-characterized protein genes obtained from metagenome analyses.一种从宏基因组分析中获得的功能未知蛋白基因的功能预测的新型生物信息学策略。
DNA Res. 2009 Oct;16(5):287-97. doi: 10.1093/dnares/dsp018. Epub 2009 Oct 3.
2
A strategy for predicting gene functions from genome and metagenome sequences on the basis of oligopeptide frequency distance.一种基于寡肽频率距离从基因组和宏基因组序列预测基因功能的策略。
Genes Genet Syst. 2020 Apr 22;95(1):11-19. doi: 10.1266/ggs.19-00041. Epub 2020 Mar 12.
3
A novel bioinformatics strategy for searching industrially useful genome resources from metagenomic sequence libraries.一种从宏基因组序列文库中搜索具有工业用途的基因组资源的新型生物信息学策略。
Genes Genet Syst. 2011;86(1):53-66. doi: 10.1266/ggs.86.53.
4
A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.一种通过BLSOM从大型基因组序列数据中进行高效知识发现的新型生物信息学方法。
Biomed Res Int. 2014;2014:765648. doi: 10.1155/2014/765648. Epub 2014 Apr 3.
5
A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM).一种用于分析微生物大序列数据以实现高效知识发现的新型生物信息学策略:批学习自组织映射(BLSOM)。
Microorganisms. 2013 Nov 20;1(1):137-157. doi: 10.3390/microorganisms1010137.
6
Development of self-compressing BLSOM for comprehensive analysis of big sequence data.用于大序列数据综合分析的自压缩BLSOM的开发。
Biomed Res Int. 2015;2015:506052. doi: 10.1155/2015/506052. Epub 2015 Oct 1.
7
AI for the collective analysis of a massive number of genome sequences: various examples from the small genome of pandemic SARS-CoV-2 to the human genome.用于大量基因组序列集体分析的人工智能:从大流行的严重急性呼吸综合征冠状病毒2的小基因组到人类基因组的各种实例。
Genes Genet Syst. 2021 Dec 16;96(4):165-176. doi: 10.1266/ggs.21-00025. Epub 2021 Sep 27.
8
Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes.通过批量学习自组织映射可视化真核生物基因组的基因组特征,特别强调果蝇基因组。
Biomed Res Int. 2014;2014:985706. doi: 10.1155/2014/985706. Epub 2014 Mar 11.
9
10
Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context.基因组比对、原核生物基因组组织的进化以及利用基因组背景预测基因功能。
Genome Res. 2001 Mar;11(3):356-72. doi: 10.1101/gr.gr-1619r.

引用本文的文献

1
Gene Mining for Conserved, Non-Annotated Proteins of Identifies Novel Target Candidates for Controlling Powdery Mildews by Spray-Induced Gene Silencing.通过喷雾诱导基因沉默挖掘保守的非注释蛋白基因以鉴定控制白粉病的新候选靶标。
J Fungi (Basel). 2021 Sep 8;7(9):735. doi: 10.3390/jof7090735.
2
A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM).一种用于分析微生物大序列数据以实现高效知识发现的新型生物信息学策略:批学习自组织映射(BLSOM)。
Microorganisms. 2013 Nov 20;1(1):137-157. doi: 10.3390/microorganisms1010137.
3
Development of self-compressing BLSOM for comprehensive analysis of big sequence data.

本文引用的文献

1
The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies.重温CATH分类——超家族中结构差异的架构综述及新表征方法
Nucleic Acids Res. 2009 Jan;37(Database issue):D310-4. doi: 10.1093/nar/gkn877. Epub 2008 Nov 7.
2
Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world.细菌和古菌的基因组学:原核生物世界新出现的动态观点。
Nucleic Acids Res. 2008 Dec;36(21):6688-719. doi: 10.1093/nar/gkn668. Epub 2008 Oct 23.
3
Phylogenetic profiles reveal evolutionary relationships within the "twilight zone" of sequence similarity.
用于大序列数据综合分析的自压缩BLSOM的开发。
Biomed Res Int. 2015;2015:506052. doi: 10.1155/2015/506052. Epub 2015 Oct 1.
4
A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.一种通过BLSOM从大型基因组序列数据中进行高效知识发现的新型生物信息学方法。
Biomed Res Int. 2014;2014:765648. doi: 10.1155/2014/765648. Epub 2014 Apr 3.
5
Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes.通过批量学习自组织映射可视化真核生物基因组的基因组特征,特别强调果蝇基因组。
Biomed Res Int. 2014;2014:985706. doi: 10.1155/2014/985706. Epub 2014 Mar 11.
6
Novel bioinformatics strategies for prediction of directional sequence changes in influenza virus genomes and for surveillance of potentially hazardous strains.新型生物信息学策略用于预测流感病毒基因组中的定向序列变化,并监测潜在危险株。
BMC Infect Dis. 2013 Aug 21;13:386. doi: 10.1186/1471-2334-13-386.
系统发育谱揭示了序列相似性“模糊地带”内的进化关系。
Proc Natl Acad Sci U S A. 2008 Sep 9;105(36):13474-9. doi: 10.1073/pnas.0803860105. Epub 2008 Sep 2.
4
The genome of Pelotomaculum thermopropionicum reveals niche-associated evolution in anaerobic microbiota.嗜热丙酸梭菌的基因组揭示了厌氧微生物群中与生态位相关的进化。
Genome Res. 2008 Mar;18(3):442-8. doi: 10.1101/gr.7136508. Epub 2008 Jan 24.
5
Predicting protein function from sequence and structure.从序列和结构预测蛋白质功能。
Nat Rev Mol Cell Biol. 2007 Dec;8(12):995-1005. doi: 10.1038/nrm2281.
6
The 20 years of PROSITE.PROSITE的二十年。
Nucleic Acids Res. 2008 Jan;36(Database issue):D245-9. doi: 10.1093/nar/gkm977. Epub 2007 Nov 14.
7
Data growth and its impact on the SCOP database: new developments.数据增长及其对SCOP数据库的影响:新进展
Nucleic Acids Res. 2008 Jan;36(Database issue):D419-25. doi: 10.1093/nar/gkm993. Epub 2007 Nov 13.
8
Exploration and grading of possible genes from 183 bacterial strains by a common protocol to identification of new genes: Gene Trek in Prokaryote Space (GTPS).通过一种通用方案对183株细菌菌株中可能存在的基因进行探索和分级,以鉴定新基因:原核生物空间中的基因探索(GTPS)。
DNA Res. 2006 Dec 31;13(6):245-54. doi: 10.1093/dnares/dsl014. Epub 2006 Dec 13.
9
Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples.对环境和临床样本中未培养微生物混合物来源的基因组序列片段进行的新型系统发育研究。
DNA Res. 2005;12(5):281-90. doi: 10.1093/dnares/dsi015. Epub 2006 Jan 10.
10
Self-Organizing Map (SOM) unveils and visualizes hidden sequence characteristics of a wide range of eukaryote genomes.自组织映射(SOM)揭示并可视化了多种真核生物基因组的隐藏序列特征。
Gene. 2006 Jan 3;365:27-34. doi: 10.1016/j.gene.2005.09.040. Epub 2005 Dec 20.