• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SECOM:一种基于新型哈希种子和社区检测的全基因组蛋白质结构域识别方法。

SECOM: a novel hash seed and community detection based-approach for genome-scale protein domain identification.

机构信息

Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.

出版信息

PLoS One. 2012;7(6):e39475. doi: 10.1371/journal.pone.0039475. Epub 2012 Jun 28.

DOI:10.1371/journal.pone.0039475
PMID:22761802
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3386278/
Abstract

With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx.

摘要

随着 DNA 测序技术的快速发展,从各种生物体中产生了大量高通量的基因组和蛋白质组数据。蛋白质的功能注释和进化历史通常是从基因组序列预测的结构域中推断出来的。然而,传统的基于数据库的结构域预测方法无法识别新的结构域,而基于比对的方法则在蛋白质组中寻找重复的片段,计算量很大。在这里,我们提出了一种新的全基因组结构域预测方法 SECOM。SECOM 不是进行所有对所有的序列比对,而是首先使用哈希种子函数对基因组中的所有蛋白质进行索引。这样就可以检测到局部相似性,并将其编码成图结构,其中每个节点代表一个蛋白质序列,每个边权重代表两个节点之间共享的哈希种子。SECOM 然后将结构域预测问题表述为这个图中的重叠社区发现问题。提出了一种有效的回溯图渗滤算法来识别结构域。我们在最近测序的五种水生动物基因组上测试了 SECOM。我们的测试表明,SECOM 能够识别出 InterProScan 识别的大多数已知结构域。与基于比对的方法相比,SECOM 在检测假定的新结构域方面具有更高的灵敏度,同时速度也快三个数量级。例如,SECOM 能够预测到一种新的海绵特异性三磷酸核苷酶 (NTPases) 结构域。此外,SECOM 还发现了两个可能具有细菌起源的新结构域,它们在分类上仅限于海葵和水螅。SECOM 是一个开源程序,可在 http://sfb.kaust.edu.sa/Pages/Software.aspx 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c60/3386278/46c540a65b94/pone.0039475.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c60/3386278/ebc5afe2c2d6/pone.0039475.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c60/3386278/f15159471214/pone.0039475.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c60/3386278/46c540a65b94/pone.0039475.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c60/3386278/ebc5afe2c2d6/pone.0039475.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c60/3386278/f15159471214/pone.0039475.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c60/3386278/46c540a65b94/pone.0039475.g005.jpg

相似文献

1
SECOM: a novel hash seed and community detection based-approach for genome-scale protein domain identification.SECOM:一种基于新型哈希种子和社区检测的全基因组蛋白质结构域识别方法。
PLoS One. 2012;7(6):e39475. doi: 10.1371/journal.pone.0039475. Epub 2012 Jun 28.
2
HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.HMMerThread:通过将宽松的序列数据库搜索与折叠识别相结合,在整个基因组中检测远程、功能保守的结构域。
PLoS One. 2011 Mar 10;6(3):e17568. doi: 10.1371/journal.pone.0017568.
3
4
Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence.通过打破共识,结合多个图谱和结构域共现情况,实现了蛋白质结构域识别的改进。
PLoS Comput Biol. 2016 Jul 29;12(7):e1005038. doi: 10.1371/journal.pcbi.1005038. eCollection 2016 Jul.
5
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.CMsearch:同时探索蛋白质序列空间和结构空间不仅能改善蛋白质同源性检测,还能提升蛋白质结构预测。
Bioinformatics. 2016 Jun 15;32(12):i332-i340. doi: 10.1093/bioinformatics/btw271.
6
Computational identification of novel chitinase-like proteins in the Drosophila melanogaster genome.果蝇基因组中新型几丁质酶样蛋白的计算鉴定
Bioinformatics. 2004 Jan 22;20(2):161-9. doi: 10.1093/bioinformatics/bth020.
7
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.
8
Rapid similarity search of proteins using alignments of domain arrangements.利用结构域排列的比对进行蛋白质的快速相似性搜索。
Bioinformatics. 2014 Jan 15;30(2):274-81. doi: 10.1093/bioinformatics/btt379. Epub 2013 Jul 4.
9
Protein domain recurrence and order can enhance prediction of protein functions.蛋白质结构域的重复和顺序可以增强对蛋白质功能的预测。
Bioinformatics. 2012 Sep 15;28(18):i444-i450. doi: 10.1093/bioinformatics/bts398.
10
Evolutionary rates at codon sites may be used to align sequences and infer protein domain function.密码子位点的进化速率可用于序列比对和推断蛋白质结构域功能。
BMC Bioinformatics. 2010 Mar 24;11:151. doi: 10.1186/1471-2105-11-151.

引用本文的文献

1
aaHash: recursive amino acid sequence hashing.氨基酸哈希值:递归氨基酸序列哈希法。
Bioinform Adv. 2023 Nov 11;3(1):vbad162. doi: 10.1093/bioadv/vbad162. eCollection 2023.
2
Significance-based community detection in weighted networks.加权网络中基于重要性的社区检测。
J Mach Learn Res. 2018 Apr;18.
3
Multi-resolution community detection in massive networks.大规模网络中的多分辨率社区检测。

本文引用的文献

1
The Amphimedon queenslandica genome and the evolution of animal complexity.澳大利亚仙女虾基因组与动物复杂性演化。
Nature. 2010 Aug 5;466(7307):720-6. doi: 10.1038/nature09201.
2
The dynamic genome of Hydra.水螅的动态基因组。
Nature. 2010 Mar 25;464(7288):592-6. doi: 10.1038/nature08830. Epub 2010 Mar 14.
3
Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx.使用454 GSFlx对珊瑚幼虫转录组进行测序和从头分析。
Sci Rep. 2016 Dec 13;6:38998. doi: 10.1038/srep38998.
4
Online community detection for large complex networks.大型复杂网络的在线社区检测
PLoS One. 2014 Jul 25;9(7):e102799. doi: 10.1371/journal.pone.0102799. eCollection 2014.
BMC Genomics. 2009 May 12;10:219. doi: 10.1186/1471-2164-10-219.
4
Sequential algorithm for fast clique percolation.用于快速团渗透的顺序算法。
Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Aug;78(2 Pt 2):026109. doi: 10.1103/PhysRevE.78.026109. Epub 2008 Aug 15.
5
The Trichoplax genome and the nature of placozoans.扁盘动物的基因组与扁盘动物的本质
Nature. 2008 Aug 21;454(7207):955-60. doi: 10.1038/nature07191.
6
The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans.领鞭毛虫短颈单孢子虫的基因组与后生动物的起源。
Nature. 2008 Feb 14;451(7180):783-8. doi: 10.1038/nature06617.
7
Assessment of predictions submitted for the CASP7 domain prediction category.对提交给CASP7结构域预测类别的预测结果的评估。
Proteins. 2007;69 Suppl 8:137-51. doi: 10.1002/prot.21675.
8
Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization.海葵基因组揭示了后生动物祖先的基因库和基因组组织。
Science. 2007 Jul 6;317(5834):86-94. doi: 10.1126/science.1139158.
9
Improved residue contact prediction using support vector machines and a large feature set.使用支持向量机和大量特征集改进残基接触预测。
BMC Bioinformatics. 2007 Apr 2;8:113. doi: 10.1186/1471-2105-8-113.
10
The genome of the sea urchin Strongylocentrotus purpuratus.紫球海胆的基因组。
Science. 2006 Nov 10;314(5801):941-52. doi: 10.1126/science.1133609.