• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PFASUM:一种来自Pfam结构比对的替换矩阵。

PFASUM: a substitution matrix from Pfam structural alignments.

作者信息

Keul Frank, Hess Martin, Goesele Michael, Hamacher Kay

机构信息

Computational Biology and Simulation, Department of Biology, Technische Universität Darmstadt, Schnittspahnstraße 2, Darmstadt, 64287, Germany.

Graphics, Capture and Massively Parallel Computing, Department of Computer Science, Technische Universität Darmstadt, Rundeturmstraße 12, Darmstadt, 64283, Germany.

出版信息

BMC Bioinformatics. 2017 Jun 5;18(1):293. doi: 10.1186/s12859-017-1703-z.

DOI:10.1186/s12859-017-1703-z
PMID:28583067
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5460430/
Abstract

BACKGROUND

Detecting homologous protein sequences and computing multiple sequence alignments (MSA) are fundamental tasks in molecular bioinformatics. These tasks usually require a substitution matrix for modeling evolutionary substitution events derived from a set of aligned sequences. Over the last years, the known sequence space increased drastically and several publications demonstrated that this can lead to significantly better performing matrices. Interestingly, matrices based on dated sequence datasets are still the de facto standard for both tasks even though their data basis may limit their capabilities. We address these aspects by presenting a new substitution matrix series called PFASUM. These matrices are derived from Pfam seed MSAs using a novel algorithm and thus build upon expert ground truth data covering a large and diverse sequence space.

RESULTS

We show results for two use cases: First, we tested the homology search performance of PFASUM matrices on up-to-date ASTRAL databases with varying sequence similarity. Our study shows that the usage of PFASUM matrices can lead to significantly better homology search results when compared to conventional matrices. PFASUM matrices with comparable relative entropies to the commonly used substitution matrices BLOSUM50, BLOSUM62, PAM250, VTML160 and VTML200 outperformed their corresponding counterparts in 93% of all test cases. A general assessment also comparing matrices with different relative entropies showed that PFASUM matrices delivered the best homology search performance in the test set. Second, our results demonstrate that the usage of PFASUM matrices for MSA construction improves their quality when compared to conventional matrices. On up-to-date MSA benchmarks, at least 60% of all MSAs were reconstructed in an equal or higher quality when using MUSCLE with PFASUM31, PFASUM43 and PFASUM60 matrices instead of conventional matrices. This rate even increases to at least 76% for MSAs containing similar sequences.

CONCLUSIONS

We present the novel PFASUM substitution matrices derived from manually curated MSA ground truth data covering the currently known sequence space. Our results imply that PFASUM matrices improve homology search performance as well as MSA quality in many cases when compared to conventional substitution matrices. Hence, we encourage the usage of PFASUM matrices and especially PFASUM60 for these specific tasks.

摘要

背景

检测同源蛋白质序列和计算多序列比对(MSA)是分子生物信息学中的基本任务。这些任务通常需要一个替换矩阵来模拟从一组比对序列中得出的进化替换事件。在过去几年中,已知的序列空间急剧增加,一些出版物表明这可以导致性能显著更好的矩阵。有趣的是,基于过时序列数据集的矩阵仍然是这两项任务的事实上的标准,尽管它们的数据基础可能会限制其能力。我们通过提出一个名为PFASUM的新替换矩阵系列来解决这些问题。这些矩阵是使用一种新颖的算法从Pfam种子MSA中推导出来的,因此建立在覆盖广泛且多样的序列空间的专家真值数据之上。

结果

我们展示了两个用例的结果:第一,我们在具有不同序列相似性的最新ASTRAL数据库上测试了PFASUM矩阵的同源性搜索性能。我们的研究表明,与传统矩阵相比,使用PFASUM矩阵可以显著提高同源性搜索结果。与常用替换矩阵BLOSUM50、BLOSUM62、PAM250、VTML160和VTML200具有可比相对熵的PFASUM矩阵在所有测试案例的93%中优于其相应的对应矩阵。一项比较不同相对熵矩阵的综合评估还表明,PFASUM矩阵在测试集中提供了最佳的同源性搜索性能。第二,我们的结果表明,与传统矩阵相比,使用PFASUM矩阵进行MSA构建可提高其质量。在最新的MSA基准测试中,当使用带有PFASUM31、PFASUM43和PFASUM60矩阵的MUSCLE而不是传统矩阵时,所有MSA中至少60%被重建为同等或更高质量。对于包含相似序列的MSA,这一比例甚至增加到至少76%。

结论

我们提出了从手动策划的MSA真值数据推导而来的新颖的PFASUM替换矩阵,该数据覆盖了当前已知的序列空间。我们的结果表明,与传统替换矩阵相比,PFASUM矩阵在许多情况下提高了同源性搜索性能以及MSA质量。因此,我们鼓励在这些特定任务中使用PFASUM矩阵,尤其是PFASUM60。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e686/5460430/9a2defe72371/12859_2017_1703_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e686/5460430/483b95703bbe/12859_2017_1703_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e686/5460430/56e55947a5b7/12859_2017_1703_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e686/5460430/9a2defe72371/12859_2017_1703_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e686/5460430/483b95703bbe/12859_2017_1703_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e686/5460430/56e55947a5b7/12859_2017_1703_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e686/5460430/9a2defe72371/12859_2017_1703_Fig3_HTML.jpg

相似文献

1
PFASUM: a substitution matrix from Pfam structural alignments.PFASUM:一种来自Pfam结构比对的替换矩阵。
BMC Bioinformatics. 2017 Jun 5;18(1):293. doi: 10.1186/s12859-017-1703-z.
2
Addressing inaccuracies in BLOSUM computation improves homology search performance.解决BLOSUM计算中的不准确问题可提高同源性搜索性能。
BMC Bioinformatics. 2016 Apr 27;17:189. doi: 10.1186/s12859-016-1060-3.
3
The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion.根据比对准确性标准对各种类型氨基酸替换矩阵进行排序。
BMC Bioinformatics. 2020 Sep 14;21(Suppl 11):294. doi: 10.1186/s12859-020-03616-0.
4
RPfam: A refiner towards curated-like multiple sequence alignments of the Pfam protein families.RPfam:一个针对 Pfam 蛋白质家族精心整理的多重序列比对的工具。
J Bioinform Comput Biol. 2022 Aug;20(4):2240002. doi: 10.1142/S0219720022400029. Epub 2022 Apr 14.
5
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.使用进化速率结合氨基酸替换矩阵进行稳健的序列比对。
BMC Bioinformatics. 2015 Aug 14;16:255. doi: 10.1186/s12859-015-0688-8.
6
Context-specific amino acid substitution matrices and their use in the detection of protein homologs.特定上下文氨基酸替换矩阵及其在蛋白质同源物检测中的应用。
Proteins. 2008 May 1;71(2):910-9. doi: 10.1002/prot.21775.
7
Fold-specific sequence scoring improves protein sequence matching.特定折叠序列评分可改善蛋白质序列匹配。
BMC Bioinformatics. 2016 Aug 30;17(1):328. doi: 10.1186/s12859-016-1198-z.
8
Optimizing substitution matrices by separating score distributions.通过分离分数分布来优化替换矩阵。
Bioinformatics. 2004 Apr 12;20(6):863-73. doi: 10.1093/bioinformatics/btg494. Epub 2004 Jan 29.
9
RBLOSUM performs better than CorBLOSUM with lesser error per query.RBLOSUM的表现优于CorBLOSUM,每个查询的错误更少。
BMC Res Notes. 2018 May 21;11(1):328. doi: 10.1186/s13104-018-3415-5.
10
Selecting the Right Similarity-Scoring Matrix.选择合适的相似性评分矩阵。
Curr Protoc Bioinformatics. 2013;43:3.5.1-3.5.9. doi: 10.1002/0471250953.bi0305s43.

引用本文的文献

1
Accurate detection of tandem repeats exposes ubiquitous reuse of biological sequences.串联重复序列的准确检测揭示了生物序列的普遍重用。
Nucleic Acids Res. 2025 Sep 5;53(17). doi: 10.1093/nar/gkaf866.
2
Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures.串联重复为趋同进化至相似蛋白质结构提供了证据。
Genome Biol Evol. 2025 Feb 3;17(2). doi: 10.1093/gbe/evaf013.
3
A BLAST from the past: revisiting blastp's E-value.来自过去的一次冲击:重新审视Blastp的E值。

本文引用的文献

1
Addressing inaccuracies in BLOSUM computation improves homology search performance.解决BLOSUM计算中的不准确问题可提高同源性搜索性能。
BMC Bioinformatics. 2016 Apr 27;17:189. doi: 10.1186/s12859-016-1060-3.
2
The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库:迈向更可持续的未来。
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.
3
Parameterized BLOSUM Matrices for Protein Alignment.用于蛋白质比对的参数化BLOSUM矩阵
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae729.
4
tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs.tcrBLOSUM:一种氨基酸替换矩阵,用于灵敏比对远距离表位特异性 TCR。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae602.
5
SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences.SHARK 能够在不可比对和无序序列中灵敏地检测进化同源物和功能类似物。
Proc Natl Acad Sci U S A. 2024 Oct 15;121(42):e2401622121. doi: 10.1073/pnas.2401622121. Epub 2024 Oct 9.
6
RESP2: An uncertainty aware multi-target multi-property optimization AI pipeline for antibody discovery.RESP2:一种用于抗体发现的具有不确定性感知的多靶点多属性优化人工智能管道。
bioRxiv. 2025 Mar 9:2024.07.30.605700. doi: 10.1101/2024.07.30.605700.
7
Computational scoring and experimental evaluation of enzymes generated by neural networks.神经网络生成的酶的计算评分与实验评估
Nat Biotechnol. 2025 Mar;43(3):396-405. doi: 10.1038/s41587-024-02214-2. Epub 2024 Apr 23.
8
Accurately clustering biological sequences in linear time by relatedness sorting.通过相关排序在线性时间内准确地对生物序列进行聚类。
Nat Commun. 2024 Apr 8;15(1):3047. doi: 10.1038/s41467-024-47371-9.
9
Protein embedding based alignment.基于蛋白质嵌入的对齐。
BMC Bioinformatics. 2024 Feb 28;25(1):85. doi: 10.1186/s12859-024-05699-5.
10
New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions.通过直接利用成对序列相关性和替换来对远程蛋白质序列进行新的比对方法。
Front Bioinform. 2023 Oct 12;3:1227193. doi: 10.3389/fbinf.2023.1227193. eCollection 2023.
IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):686-94. doi: 10.1109/TCBB.2014.2366126.
4
Visual exploration of parameter influence on phylogenetic trees.参数对系统发育树影响的可视化探索。
IEEE Comput Graph Appl. 2014 Mar-Apr;34(2):48-56. doi: 10.1109/MCG.2014.2.
5
SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures.SCOPe:蛋白质结构分类——扩展版,整合了 SCOP 和 ASTRAL 数据以及新结构的分类。
Nucleic Acids Res. 2014 Jan;42(Database issue):D304-9. doi: 10.1093/nar/gkt1240. Epub 2013 Dec 3.
6
A new generation of homology search tools based on probabilistic inference.基于概率推理的新一代同源性搜索工具。
Genome Inform. 2009 Oct;23(1):205-11.
7
Optimizing substitution matrix choice and gap parameters for sequence alignment.优化序列比对的替换矩阵选择和空位参数。
BMC Bioinformatics. 2009 Dec 2;10:396. doi: 10.1186/1471-2105-10-396.
8
Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty.使用多个参数集进行局部序列比对的成对统计显著性以及参数集变化罚分的经验依据。
BMC Bioinformatics. 2009 Mar 19;10 Suppl 3(Suppl 3):S1. doi: 10.1186/1471-2105-10-S3-S1.
9
BLOSUM62 miscalculations improve search performance.BLOSUM62算法的误算可提高搜索性能。
Nat Biotechnol. 2008 Mar;26(3):274-5. doi: 10.1038/nbt0308-274.
10
Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap.使用贝叶斯自助法对成对蛋白质序列比较进行统计评估。
Bioinformatics. 2005 Oct 15;21(20):3824-31. doi: 10.1093/bioinformatics/bti627. Epub 2005 Aug 16.