Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia.
Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i255-i263. doi: 10.1093/bioinformatics/btac247.
Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments.
By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone.
Supplementary data are available at Bioinformatics online.
比对是序列之间的对应关系。蛋白质的氨基酸序列比对有多可靠,以及可以从中得出关于蛋白质关系的哪些推论?我们使用以前从未应用于这些问题的技术,通过为每个可能的序列比对分配其后验概率的权重,得出了一个正式的数学期望,并开发了一种有效的算法,用于计算替代比对之间的距离,从而可以对基于序列的比对与相应的参考结构比对进行定量比较。
通过分析 100 万个蛋白质结构域对的序列和结构,我们报告了基于序列和基于结构的比对之间的期望距离随(马氏时间)序列分歧的变化。我们的结果清楚地区分了仅从序列信息推断残基-残基对应关系的“日光区”、“暮光区”和“午夜区”。
补充数据可在 Bioinformatics 在线获得。