• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于残基变异性估计的背景频率:重新审视BLOSUM矩阵

Background frequencies for residue variability estimates: BLOSUM revisited.

作者信息

Mihalek I, Res I, Lichtarge O

机构信息

Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.

出版信息

BMC Bioinformatics. 2007 Dec 27;8:488. doi: 10.1186/1471-2105-8-488.

DOI:10.1186/1471-2105-8-488
PMID:18162129
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2267808/
Abstract

BACKGROUND

Shannon entropy applied to columns of multiple sequence alignments as a score of residue conservation has proven one of the most fruitful ideas in bioinformatics. This straightforward and intuitively appealing measure clearly shows the regions of a protein under increased evolutionary pressure, highlighting their functional importance. The inability of the column entropy to differentiate between residue types, however, limits its resolution power.

RESULTS

In this work we suggest generalizing Shannon's expression to a function with similar mathematical properties, that, at the same time, includes observed propensities of residue types to mutate to each other. To do that, we revisit the original construction of BLOSUM matrices, and re-interpret them as mutation probability matrices. These probabilities are then used as background frequencies in the revised residue conservation measure.

CONCLUSION

We show that joint entropy with BLOSUM-proportional probabilities as a reference distribution enables detection of protein functional sites comparable in quality to a time-costly maximum-likelihood evolution simulation method (rate4site), and offers greater resolution than the Shannon entropy alone, in particular in the cases when the available sequences are of narrow evolutionary scope.

摘要

背景

将香农熵应用于多序列比对的列,作为残基保守性的评分,已被证明是生物信息学中最富有成效的理念之一。这种直接且直观的吸引人的度量方法清楚地显示了蛋白质在进化压力增加下的区域,突出了它们的功能重要性。然而,列熵无法区分残基类型,这限制了它的分辨能力。

结果

在这项工作中,我们建议将香农表达式推广到具有相似数学性质的函数,该函数同时包含观察到的残基类型相互突变的倾向。为此,我们重新审视BLOSUM矩阵的原始构建,并将它们重新解释为突变概率矩阵。然后,这些概率被用作修订后的残基保守性度量中的背景频率。

结论

我们表明,以BLOSUM比例概率作为参考分布的联合熵能够检测出与耗时的最大似然进化模拟方法(rate4site)质量相当的蛋白质功能位点,并且比单独的香农熵具有更高的分辨率,特别是在可用序列的进化范围较窄的情况下。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f678/2267808/8af2d9e5fc81/1471-2105-8-488-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f678/2267808/b07f6da4409d/1471-2105-8-488-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f678/2267808/e10af61b8d81/1471-2105-8-488-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f678/2267808/b91ead1015f3/1471-2105-8-488-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f678/2267808/8af2d9e5fc81/1471-2105-8-488-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f678/2267808/b07f6da4409d/1471-2105-8-488-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f678/2267808/e10af61b8d81/1471-2105-8-488-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f678/2267808/b91ead1015f3/1471-2105-8-488-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f678/2267808/8af2d9e5fc81/1471-2105-8-488-4.jpg

相似文献

1
Background frequencies for residue variability estimates: BLOSUM revisited.用于残基变异性估计的背景频率:重新审视BLOSUM矩阵
BMC Bioinformatics. 2007 Dec 27;8:488. doi: 10.1186/1471-2105-8-488.
2
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.使用进化速率结合氨基酸替换矩阵进行稳健的序列比对。
BMC Bioinformatics. 2015 Aug 14;16:255. doi: 10.1186/s12859-015-0688-8.
3
The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions.用于比较具有非标准组成的蛋白质的氨基酸替换矩阵的构建。
Bioinformatics. 2005 Apr 1;21(7):902-11. doi: 10.1093/bioinformatics/bti070. Epub 2004 Oct 27.
4
Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions.利用局部结构预测改进“黄昏区”蛋白质的成对比对。
Bioinformatics. 2006 Feb 15;22(4):413-22. doi: 10.1093/bioinformatics/bti828. Epub 2005 Dec 13.
5
Robustness of the residue conservation score reflecting both frequencies and physicochemistries.反映频率和物理化学性质的残基保守性评分的稳健性。
Amino Acids. 2008 May;34(4):643-52. doi: 10.1007/s00726-007-0017-2. Epub 2008 Jan 4.
6
The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction.蛋白质功能残基预测中保守性与相关系统发育的对比特性。
BMC Bioinformatics. 2008 Jan 25;9:51. doi: 10.1186/1471-2105-9-51.
7
H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments.H2rs:通过对多个序列比对进行基于熵和相似性的分析来推断进化和功能重要的残基位置。
BMC Bioinformatics. 2014 Apr 27;15:118. doi: 10.1186/1471-2105-15-118.
8
Scoring alignments by embedding vector similarity.通过嵌入向量相似度对配准进行评分。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae178.
9
On the significance of sequence alignments when using multiple scoring matrices.关于使用多个评分矩阵时序列比对的重要性。
Bioinformatics. 2004 Apr 12;20(6):881-7. doi: 10.1093/bioinformatics/btg498. Epub 2004 Jan 29.
10
PROMALS web server for accurate multiple protein sequence alignments.用于精确多蛋白序列比对的PROMALS网络服务器。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W649-52. doi: 10.1093/nar/gkm227. Epub 2007 Apr 22.

引用本文的文献

1
Characterizing and predicting ccRCC-causing missense mutations in Von Hippel-Lindau disease.鉴定和预测 Von Hippel-Lindau 病中导致肾透明细胞癌的错义突变。
Hum Mol Genet. 2024 Jan 20;33(3):224-232. doi: 10.1093/hmg/ddad181.
2
Identification of evolutionarily stable functional and immunogenic sites across the SARS-CoV-2 proteome and the greater coronavirus family.在严重急性呼吸综合征冠状病毒2(SARS-CoV-2)蛋白质组以及更广泛的冠状病毒家族中鉴定进化上稳定的功能和免疫原性位点。
Res Sq. 2021 Feb 15:rs.3.rs-95030. doi: 10.21203/rs.3.rs-95030/v3.
3
Towards a gamete matching platform: using immunogenetics and artificial intelligence to predict recurrent miscarriage.

本文引用的文献

1
On itinerant water molecules and detectability of protein-protein interfaces through comparative analysis of homologues.论游走水分子与通过同源物比较分析检测蛋白质-蛋白质界面
J Mol Biol. 2007 Jun 1;369(2):584-95. doi: 10.1016/j.jmb.2007.03.057. Epub 2007 Mar 24.
2
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.美国国立生物技术信息中心参考序列(RefSeq):一个经过整理的基因组、转录本和蛋白质的非冗余序列数据库。
Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5. doi: 10.1093/nar/gkl842. Epub 2006 Nov 27.
3
A structure and evolution-guided Monte Carlo sequence selection strategy for multiple alignment-based analysis of proteins.
迈向配子匹配平台:利用免疫遗传学和人工智能预测复发性流产。
NPJ Digit Med. 2019 Mar 7;2:12. doi: 10.1038/s41746-019-0089-x. eCollection 2019.
4
Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families.决定因素、判别式、保守残基——一种检测蛋白质家族功能分歧的启发式方法。
PLoS One. 2011;6(9):e24382. doi: 10.1371/journal.pone.0024382. Epub 2011 Sep 12.
5
A comparative study of conservation and variation scores.保护和变异分数的比较研究。
BMC Bioinformatics. 2010 Jul 21;11:388. doi: 10.1186/1471-2105-11-388.
一种用于基于多序列比对的蛋白质分析的结构与进化引导的蒙特卡洛序列选择策略。
Bioinformatics. 2006 Jan 15;22(2):149-56. doi: 10.1093/bioinformatics/bti791. Epub 2005 Nov 22.
4
Prediction of functional specificity determinants from protein sequences using log-likelihood ratios.利用对数似然比从蛋白质序列预测功能特异性决定因素。
Bioinformatics. 2006 Jan 15;22(2):164-71. doi: 10.1093/bioinformatics/bti766. Epub 2005 Nov 8.
5
Predicting specificity-determining residues in two large eukaryotic transcription factor families.预测两个大型真核转录因子家族中的特异性决定残基。
Nucleic Acids Res. 2005 Aug 5;33(14):4455-65. doi: 10.1093/nar/gki755. Print 2005.
6
Sequence signatures and the probabilistic identification of proteins in the Myc-Max-Mad network.Myc-Max-Mad网络中的序列特征与蛋白质的概率识别
Proc Natl Acad Sci U S A. 2005 May 3;102(18):6401-6. doi: 10.1073/pnas.0408964102. Epub 2005 Apr 25.
7
A family of evolution-entropy hybrid methods for ranking protein residues by importance.一种用于按重要性对蛋白质残基进行排序的进化-熵混合方法族。
J Mol Biol. 2004 Mar 5;336(5):1265-82. doi: 10.1016/j.jmb.2003.12.078.
8
Searching for functional sites in protein structures.在蛋白质结构中寻找功能位点。
Curr Opin Chem Biol. 2004 Feb;8(1):3-7. doi: 10.1016/j.cbpa.2003.11.001.
9
MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE:具有高精度和高吞吐量的多序列比对。
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
10
A transition probability model for amino acid substitutions from blocks.一种基于模块的氨基酸替换转换概率模型。
J Comput Biol. 2003;10(6):997-1010. doi: 10.1089/106652703322756195.