上下文特定氨基酸取代概率的判别建模。

Discriminative modelling of context-specific amino acid substitution probabilities.

机构信息

Gene Center Munich and Department of Biochemistry, Ludwig-Maximilians-Universtät München, 81377 Munich, Germany.

出版信息

Bioinformatics. 2012 Dec 15;28(24):3240-7. doi: 10.1093/bioinformatics/bts622. Epub 2012 Oct 17.

DOI:10.1093/bioinformatics/bts622

PMID:23080114

Abstract

MOTIVATION

Protein sequence searching and alignment are fundamental tools of modern biology. Alignments are assessed using their similarity scores, essentially the sum of substitution matrix scores over all pairs of aligned amino acids. We previously proposed a generative probabilistic method that yields scores that take the sequence context around each aligned residue into account. This method showed drastically improved sensitivity and alignment quality compared with standard substitution matrix-based alignment.

RESULTS

Here, we develop an alternative discriminative approach to predict sequence context-specific substitution scores. We applied our approach to compute context-specific sequence profiles for Basic Local Alignment Search Tool (BLAST) and compared the new tool (CS-BLASTdis) to BLAST and the previous context-specific version (CS-BLASTgen). On a dataset filtered to 20% maximum sequence identity, CS-BLASTdisis was 51% more sensitive than BLAST and 17% more sensitive than CS-BLASTgenin, detecting remote homologues at 10% false discovery rate. At 30% maximum sequence identity, its alignments contain 21 and 12% more correct residue pairs than those of BLAST and CS-BLASTgen, respectively. Clear improvements are also seen when the approach is combined with PSI-BLAST and HHblits. We believe the context-specific approach should replace substitution matrices wherever sensitivity and alignment quality are critical.

摘要

动机

蛋白质序列搜索和比对是现代生物学的基本工具。比对的评估使用它们的相似性得分，本质上是所有对齐氨基酸对的替换矩阵得分的总和。我们之前提出了一种生成概率方法，该方法生成的分数考虑了每个对齐残基周围的序列上下文。与基于标准替换矩阵的比对相比，该方法显示出明显提高的敏感性和比对质量。

结果

在这里，我们开发了一种替代的判别方法来预测序列上下文特定的替换分数。我们将我们的方法应用于计算基本局部比对搜索工具 (BLAST) 的上下文特定序列分布，并将新工具 (CS-BLASTdis) 与 BLAST 和之前的上下文特定版本 (CS-BLASTgen) 进行比较。在经过过滤以获得最大序列同一性 20%的数据集上，CS-BLASTdis 的敏感性比 BLAST 高 51%，比 CS-BLASTgen 高 17%，在假发现率为 10%时检测到远程同源物。在最大序列同一性为 30%时，它的比对比 BLAST 和 CS-BLASTgen 的正确残基对分别多 21%和 12%。当该方法与 PSI-BLAST 和 HHblits 结合使用时，也可以看到明显的改进。我们认为，在敏感性和比对质量至关重要的情况下，应该用上下文特定方法替代替换矩阵。

相似文献

Discriminative modelling of context-specific amino acid substitution probabilities.

Bioinformatics. 2012 Dec 15;28(24):3240-7. doi: 10.1093/bioinformatics/bts622. Epub 2012 Oct 17.

Sequence context-specific profiles for homology searching.

Proc Natl Acad Sci U S A. 2009 Mar 10;106(10):3770-5. doi: 10.1073/pnas.0810767106. Epub 2009 Feb 20.

Fold-specific sequence scoring improves protein sequence matching.

BMC Bioinformatics. 2016 Aug 30;17(1):328. doi: 10.1186/s12859-016-1198-z.

CTX-BLAST: context sensitive version of protein BLAST.

Bioinformatics. 2007 Jul 1;23(13):1686-8. doi: 10.1093/bioinformatics/btm136. Epub 2007 Apr 25.

Pairwise statistical significance of local sequence alignment using sequence-specific and position-specific substitution matrices.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):194-205. doi: 10.1109/TCBB.2009.69.

Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.

PLoS One. 2011;6(12):e27872. doi: 10.1371/journal.pone.0027872. Epub 2011 Dec 2.

Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.

BMC Bioinformatics. 2015 Aug 14;16:255. doi: 10.1186/s12859-015-0688-8.

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.

BMC Res Notes. 2015 May 7;8:187. doi: 10.1186/s13104-015-1152-6.

A performance enhanced PSI-BLAST based on hybrid alignment.

Bioinformatics. 2011 Jan 1;27(1):31-7. doi: 10.1093/bioinformatics/btq621. Epub 2010 Nov 24.

Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions.

Bioinformatics. 2006 Feb 15;22(4):413-22. doi: 10.1093/bioinformatics/bti828. Epub 2005 Dec 13.

引用本文的文献

Using evolutionary data to make sense of macromolecules with a "face-lifted" ConSurf.

Protein Sci. 2023 Mar;32(3):e4582. doi: 10.1002/pro.4582.

Structural characterization of two solute-binding proteins for -diacetylchitobiose/-triacetylchitotoriose of the gram-positive bacterium, sp. str. FPU-7.

J Struct Biol X. 2021 Jun 10;5:100049. doi: 10.1016/j.yjsbx.2021.100049. eCollection 2021.

HH-suite3 for fast remote homology detection and deep protein annotation.

BMC Bioinformatics. 2019 Sep 14;20(1):473. doi: 10.1186/s12859-019-3019-7.

De novo profile generation based on sequence context specificity with the long short-term memory network.

BMC Bioinformatics. 2018 Jul 18;19(1):272. doi: 10.1186/s12859-018-2284-1.

Ensembles generated from crystal structures of single distant homologues solve challenging molecular-replacement cases in AMPLE.

Acta Crystallogr D Struct Biol. 2018 Mar 1;74(Pt 3):183-193. doi: 10.1107/S2059798318002310. Epub 2018 Mar 2.

Derivative-free neural network for optimizing the scoring functions associated with dynamic programming of pairwise-profile alignment.

Algorithms Mol Biol. 2018 Feb 15;13:5. doi: 10.1186/s13015-018-0123-6. eCollection 2018.

In silico interaction analysis of cannabinoid receptor interacting protein 1b (CRIP1b) - CB1 cannabinoid receptor.

J Mol Graph Model. 2017 Oct;77:311-321. doi: 10.1016/j.jmgm.2017.09.006. Epub 2017 Sep 6.

Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance.

BMC Bioinformatics. 2017 Jun 2;18(1):288. doi: 10.1186/s12859-017-1686-9.

In silico analyses of deleterious missense SNPs of human apolipoprotein E3.

Sci Rep. 2017 May 30;7(1):2509. doi: 10.1038/s41598-017-01737-w.

A dynamic hydrophobic core orchestrates allostery in protein kinases.

Sci Adv. 2017 Apr 7;3(4):e1600663. doi: 10.1126/sciadv.1600663. eCollection 2017 Apr.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

上下文特定氨基酸取代概率的判别建模。

Discriminative modelling of context-specific amino acid substitution probabilities.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献