• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

代谢背景断言标记推断的位点 (SIMBAL):调整部分系统发育轮廓算法以扫描序列,寻找预测蛋白质功能的特征。

Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function.

机构信息

J, Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.

出版信息

BMC Bioinformatics. 2010 Jan 26;11:52. doi: 10.1186/1471-2105-11-52.

DOI:10.1186/1471-2105-11-52
PMID:20102603
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3098086/
Abstract

BACKGROUND

Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets.

RESULTS

Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization.

CONCLUSIONS

SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites.

摘要

背景

比较基因组学方法,如系统发育分析,可以从固有噪声的生物数据集挖掘出强大的推论。我们引入了代谢背景断言标记推断的位点(SIMBAL),这是一种在蛋白质序列内局部应用部分系统发育分析(PPP)方法来发现与功能位点相关的短序列特征的方法。该方法基于 PPP 采用的基本评分机制,即使用二项式分布统计来优化搜索分区训练集时的序列相似性截止值。

结果

在这里,我们通过应用于两个具有良好特征的蛋白质家族来说明和验证 SIMBAL 方法发现功能相关短序列特征的能力。在第一个例子中,我们使用代谢背景特性(尿素利用)来划分 ABC 转运体家族。因此,该家族的 TRUE 集由其起源基因组编码尿素利用系统的成员组成。通过在转运体的序列上移动滑动窗口,并依次搜索每个子序列与完整的分区蛋白集,该方法找到了与尿素利用特性最相关的局部序列特征。将 SIMBAL“热点”映射到同源转运体的晶体结构上表明,显著的位点是细胞质侧的门控决定因素,而不是细胞外侧的底物结合蛋白的停靠位点。在第二个例子中,我们使用基因邻近性作为标准来划分蛋白质甲基转移酶家族。在这种情况下,TRUE 集由那些编码在底物 RF-1 基因附近的甲基转移酶组成。SIMBAL 识别映射到底物结合界面的序列区域,同时忽略一般涉及甲基转移酶反应机制的区域。两种训练集构建方法都不需要任何先前的实验表征。

结论

SIMBAL 表明,在功能上不同的蛋白质家族中,选择的短序列通常比其全长序列更能通过序列相似性进行功能预测,这为改进功能分类器提供了途径。当与结构数据结合使用时,SIMBAL 可以定位和模拟功能位点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/e63f8cbc348e/1471-2105-11-52-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/7752e193a583/1471-2105-11-52-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/faa93c5cf3db/1471-2105-11-52-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/968a62c9177c/1471-2105-11-52-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/dab84d1d7273/1471-2105-11-52-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/2e3225a22d7e/1471-2105-11-52-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/da17277179ed/1471-2105-11-52-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/e63f8cbc348e/1471-2105-11-52-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/7752e193a583/1471-2105-11-52-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/faa93c5cf3db/1471-2105-11-52-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/968a62c9177c/1471-2105-11-52-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/dab84d1d7273/1471-2105-11-52-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/2e3225a22d7e/1471-2105-11-52-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/da17277179ed/1471-2105-11-52-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b21f/3098086/e63f8cbc348e/1471-2105-11-52-7.jpg

相似文献

1
Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function.代谢背景断言标记推断的位点 (SIMBAL):调整部分系统发育轮廓算法以扫描序列,寻找预测蛋白质功能的特征。
BMC Bioinformatics. 2010 Jan 26;11:52. doi: 10.1186/1471-2105-11-52.
2
A comprehensive software suite for protein family construction and functional site prediction.一个用于蛋白质家族构建和功能位点预测的综合软件套件。
PLoS One. 2017 Feb 9;12(2):e0171758. doi: 10.1371/journal.pone.0171758. eCollection 2017.
3
Inferring protein interactions from phylogenetic distance matrices.从系统发育距离矩阵推断蛋白质相互作用。
Bioinformatics. 2003 Nov 1;19(16):2039-45. doi: 10.1093/bioinformatics/btg278.
4
Bioinformatic evidence for a widely distributed, ribosomally produced electron carrier precursor, its maturation proteins, and its nicotinoprotein redox partners.生物信息学证据表明,一种广泛分布的、核糖体合成的电子载体前体、其成熟蛋白以及其尼克蛋白氧化还原伴侣。
BMC Genomics. 2011 Jan 11;12:21. doi: 10.1186/1471-2164-12-21.
5
Unexpected abundance of coenzyme F(420)-dependent enzymes in Mycobacterium tuberculosis and other actinobacteria.出乎意料的是,分枝杆菌和其他放线菌中辅酶 F(420)依赖性酶的丰度很高。
J Bacteriol. 2010 Nov;192(21):5788-98. doi: 10.1128/JB.00425-10. Epub 2010 Jul 30.
6
The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction.蛋白质功能残基预测中保守性与相关系统发育的对比特性。
BMC Bioinformatics. 2008 Jan 25;9:51. doi: 10.1186/1471-2105-9-51.
7
Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.Rate4Site:一种通过蛋白质同源物中进化决定因素的表面映射来识别蛋白质功能区域的算法工具。
Bioinformatics. 2002;18 Suppl 1:S71-7. doi: 10.1093/bioinformatics/18.suppl_1.s71.
8
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
9
Relating destabilizing regions to known functional sites in proteins.将不稳定区域与蛋白质中已知的功能位点相关联。
BMC Bioinformatics. 2007 Apr 30;8:141. doi: 10.1186/1471-2105-8-141.
10
Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.通过通用相似性度量对生物序列和结构进行基于压缩的分类:实验评估
BMC Bioinformatics. 2007 Jul 13;8:252. doi: 10.1186/1471-2105-8-252.

引用本文的文献

1
discovery of the myxosortases that process MYXO-CTERM and three novel prokaryotic C-terminal protein-sorting signals that share invariant Cys residues.发现了能够处理 MYXO-CTERM 以及三个具有不变半胱氨酸残基的新型原核 C 末端蛋白分选信号的 myxosortases。
J Bacteriol. 2024 Jan 25;206(1):e0017323. doi: 10.1128/jb.00173-23. Epub 2023 Dec 12.
2
Solving the Conundrum: Widespread Proteins Annotated for Urea Metabolism in Bacteria Are Carboxyguanidine Deiminases Mediating Nitrogen Assimilation from Guanidine.破解难题:在细菌中广泛注释的尿素代谢蛋白是通过胍基转化为氮源的羧基胍基脒基水解酶。
Biochemistry. 2020 Sep 8;59(35):3258-3270. doi: 10.1021/acs.biochem.0c00537. Epub 2020 Aug 25.
3

本文引用的文献

1
Structural insights into ABC transporter mechanism.ABC转运蛋白机制的结构解析
Curr Opin Struct Biol. 2008 Dec;18(6):726-33. doi: 10.1016/j.sbi.2008.09.007. Epub 2008 Nov 5.
2
A simple, fast, and accurate method of phylogenomic inference.一种简单、快速且准确的系统发育基因组推断方法。
Genome Biol. 2008 Oct 13;9(10):R151. doi: 10.1186/gb-2008-9-10-r151.
3
INTREPID--INformation-theoretic TREe traversal for Protein functional site IDentification.INTREPID——用于蛋白质功能位点识别的信息论树遍历法
A comprehensive software suite for protein family construction and functional site prediction.
一个用于蛋白质家族构建和功能位点预测的综合软件套件。
PLoS One. 2017 Feb 9;12(2):e0171758. doi: 10.1371/journal.pone.0171758. eCollection 2017.
4
A comparative genomics perspective on the genetic content of the alkaliphilic haloarchaeon Natrialba magadii ATCC 43099T.从比较基因组学角度看嗜碱盐古菌 Natrialba magadii ATCC 43099T 的遗传物质组成。
BMC Genomics. 2012 May 4;13:165. doi: 10.1186/1471-2164-13-165.
5
GlyGly-CTERM and rhombosortase: a C-terminal protein processing signal in a many-to-one pairing with a rhomboid family intramembrane serine protease.甘氨酰-甘氨酰基末端基序和 Rhombosortase:一种与 Rhomboid 家族跨膜丝氨酸蛋白酶呈一对多配对的 C 末端蛋白加工信号。
PLoS One. 2011;6(12):e28886. doi: 10.1371/journal.pone.0028886. Epub 2011 Dec 14.
6
ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process.ProPhylo:部分系统发育分析指导蛋白质家族构建和生物过程功能注释。
BMC Bioinformatics. 2011 Nov 9;12:434. doi: 10.1186/1471-2105-12-434.
7
Bioinformatic evidence for a widely distributed, ribosomally produced electron carrier precursor, its maturation proteins, and its nicotinoprotein redox partners.生物信息学证据表明,一种广泛分布的、核糖体合成的电子载体前体、其成熟蛋白以及其尼克蛋白氧化还原伴侣。
BMC Genomics. 2011 Jan 11;12:21. doi: 10.1186/1471-2164-12-21.
8
Novel sequence-based method for identifying transcription factor binding sites in prokaryotic genomes.基于序列的新型方法用于鉴定原核基因组中的转录因子结合位点。
Bioinformatics. 2010 Nov 1;26(21):2672-7. doi: 10.1093/bioinformatics/btq501. Epub 2010 Aug 31.
9
Unexpected abundance of coenzyme F(420)-dependent enzymes in Mycobacterium tuberculosis and other actinobacteria.出乎意料的是,分枝杆菌和其他放线菌中辅酶 F(420)依赖性酶的丰度很高。
J Bacteriol. 2010 Nov;192(21):5788-98. doi: 10.1128/JB.00425-10. Epub 2010 Jul 30.
Bioinformatics. 2008 Nov 1;24(21):2445-52. doi: 10.1093/bioinformatics/btn474. Epub 2008 Sep 6.
4
Using the MetaCyc pathway database and the BioCyc database collection.使用MetaCyc通路数据库和BioCyc数据库集合。
Curr Protoc Bioinformatics. 2007 Dec;Chapter 1:1.17.1-1.17.51. doi: 10.1002/0471250953.bi0117s20.
5
The Pfam protein families database.Pfam蛋白质家族数据库。
Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. doi: 10.1093/nar/gkm960. Epub 2007 Nov 26.
6
Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution.通过蛋白质的系统发育分布预测其功能的实践与理论进展。
J R Soc Interface. 2008 Feb 6;5(19):151-70. doi: 10.1098/rsif.2007.1047.
7
Improving the accuracy of transmembrane protein topology prediction using evolutionary information.利用进化信息提高跨膜蛋白拓扑结构预测的准确性。
Bioinformatics. 2007 Mar 1;23(5):538-44. doi: 10.1093/bioinformatics/btl677. Epub 2007 Jan 19.
8
An inward-facing conformation of a putative metal-chelate-type ABC transporter.一种假定的金属螯合型ABC转运蛋白的向内构象。
Science. 2007 Jan 19;315(5810):373-7. doi: 10.1126/science.1133488. Epub 2006 Dec 7.
9
TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes.TIGRFAMs与基因组特性:用于确定原核生物基因组中分子功能和生物学过程的工具。
Nucleic Acids Res. 2007 Jan;35(Database issue):D260-4. doi: 10.1093/nar/gkl1043. Epub 2006 Dec 6.
10
Exopolysaccharide-associated protein sorting in environmental organisms: the PEP-CTERM/EpsH system. Application of a novel phylogenetic profiling heuristic.环境生物中胞外多糖相关蛋白的分选:PEP-CTERM/EpsH系统。一种新型系统发育谱启发法的应用。
BMC Biol. 2006 Aug 24;4:29. doi: 10.1186/1741-7007-4-29.