关于氨基酸序列比对的可靠性和推断限制。

On the reliability and the limits of inference of amino acid sequence alignments.

机构信息

Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia.

Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i255-i263. doi: 10.1093/bioinformatics/btac247.

DOI:10.1093/bioinformatics/btac247

PMID:35758808

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9235515/

Abstract

MOTIVATION

Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments.

RESULTS

By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

比对是序列之间的对应关系。蛋白质的氨基酸序列比对有多可靠，以及可以从中得出关于蛋白质关系的哪些推论？我们使用以前从未应用于这些问题的技术，通过为每个可能的序列比对分配其后验概率的权重，得出了一个正式的数学期望，并开发了一种有效的算法，用于计算替代比对之间的距离，从而可以对基于序列的比对与相应的参考结构比对进行定量比较。

结果

通过分析 100 万个蛋白质结构域对的序列和结构，我们报告了基于序列和基于结构的比对之间的期望距离随（马氏时间）序列分歧的变化。我们的结果清楚地区分了仅从序列信息推断残基-残基对应关系的“日光区”、“暮光区”和“午夜区”。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7045/9235515/d45af159da20/btac247f1.jpg

相似文献

On the reliability and the limits of inference of amino acid sequence alignments.关于氨基酸序列比对的可靠性和推断限制。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i255-i263. doi: 10.1093/bioinformatics/btac247.

Protein structure prediction improves the quality of amino-acid sequence alignment.蛋白质结构预测可以提高氨基酸序列比对的质量。

Proteins. 2022 Dec;90(12):2144-2147. doi: 10.1002/prot.26392. Epub 2022 Jul 15.

A reliable sequence alignment method based on probabilities of residue correspondences.一种基于残基对应概率的可靠序列比对方法。

Protein Eng. 1995 Oct;8(10):999-1009. doi: 10.1093/protein/8.10.999.

Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions.利用局部结构预测改进“黄昏区”蛋白质的成对比对。

Bioinformatics. 2006 Feb 15;22(4):413-22. doi: 10.1093/bioinformatics/bti828. Epub 2005 Dec 13.

PROMALS: towards accurate multiple sequence alignments of distantly related proteins.PROMALS：用于实现远缘相关蛋白质准确多序列比对

Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31.

Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.蛋白质结构比对在用于结构预测的迭代隐马尔可夫模型协议中的应用。

BMC Bioinformatics. 2006 Sep 14;7:410. doi: 10.1186/1471-2105-7-410.

Large-scale comparison of protein sequence alignment algorithms with structure alignments.蛋白质序列比对算法与结构比对的大规模比较。

Proteins. 2000 Jul 1;40(1):6-22. doi: 10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7.

Statistical inference of protein structural alignments using information and compression.利用信息与压缩技术对蛋白质结构比对进行统计推断

Bioinformatics. 2017 Apr 1;33(7):1005-1013. doi: 10.1093/bioinformatics/btw757.

Predicting reliable regions in protein alignments from sequence profiles.从序列谱预测蛋白质比对中的可靠区域。

J Mol Biol. 2003 Jul 18;330(4):705-18. doi: 10.1016/s0022-2836(03)00622-3.

Accuracy of structure-based sequence alignment of automatic methods.自动方法的基于结构的序列比对准确性。

BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355.

引用本文的文献

Impact of local unfolding fluctuations on the evolution of regional sequence preferences in proteins.局部解折叠波动对蛋白质区域序列偏好性演变的影响。

Protein Sci. 2025 Mar;34(3):e70015. doi: 10.1002/pro.70015.

Gene-level alignment of single-cell trajectories.单细胞轨迹的基因水平比对。

Nat Methods. 2025 Jan;22(1):68-81. doi: 10.1038/s41592-024-02378-4. Epub 2024 Sep 19.

Comparative Analyses of Bacteriophage Genomes.噬菌体基因组比较分析。

Methods Mol Biol. 2024;2802:427-453. doi: 10.1007/978-1-0716-3838-5_14.

Odor-evoked transcriptomics of Aedes aegypti mosquitoes.埃及伊蚊气味诱导的转录组学研究。

PLoS One. 2023 Oct 24;18(10):e0293018. doi: 10.1371/journal.pone.0293018. eCollection 2023.

Bridging the gaps in statistical models of protein alignment.填补蛋白质比对统计模型中的空白。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i229-i237. doi: 10.1093/bioinformatics/btac246.

本文引用的文献

Bridging the gaps in statistical models of protein alignment.填补蛋白质比对统计模型中的空白。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i229-i237. doi: 10.1093/bioinformatics/btac246.

Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors.使用有限状态模型和狄利克雷先验概率对蛋白质序列进行统计压缩，并对竞争比对进行边缘概率景观推断。

Bioinformatics. 2019 Jul 15;35(14):i360-i369. doi: 10.1093/bioinformatics/btz368.

Statistical inference of protein structural alignments using information and compression.利用信息与压缩技术对蛋白质结构比对进行统计推断

Bioinformatics. 2017 Apr 1;33(7):1005-1013. doi: 10.1093/bioinformatics/btw757.

Parameterizing sequence alignment with an explicit evolutionary model.使用显式进化模型对序列比对进行参数化。

BMC Bioinformatics. 2015 Dec 10;16:406. doi: 10.1186/s12859-015-0832-5.

ECOD: an evolutionary classification of protein domains.ECOD：蛋白质结构域的进化分类

PLoS Comput Biol. 2014 Dec 4;10(12):e1003926. doi: 10.1371/journal.pcbi.1003926. eCollection 2014 Dec.

Bioinformatics. 2015 Mar 1;31(5):674-81. doi: 10.1093/bioinformatics/btu697. Epub 2014 Oct 22.

Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone.远程同源性检测方法可以结合起来，使“午夜区”的覆盖率提高10%。

Bioinformatics. 2007 Sep 15;23(18):2353-60. doi: 10.1093/bioinformatics/btm355. Epub 2007 Aug 20.

MUSTANG: a multiple structural alignment algorithm.MUSTANG：一种多重结构比对算法。

Proteins. 2006 Aug 15;64(3):559-74. doi: 10.1002/prot.20921.

Optimal sequence alignments.最佳序列比对。

Proc Natl Acad Sci U S A. 1983 Mar;80(5):1382-6. doi: 10.1073/pnas.80.5.1382.

TM-align: a protein structure alignment algorithm based on the TM-score.TM-align：一种基于TM分数的蛋白质结构比对算法。

Nucleic Acids Res. 2005 Apr 22;33(7):2302-9. doi: 10.1093/nar/gki524. Print 2005.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

关于氨基酸序列比对的可靠性和推断限制。

On the reliability and the limits of inference of amino acid sequence alignments.

机构信息

出版信息

MOTIVATION

RESULTS

SUPPLEMENTARY INFORMATION

动机

结果

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献