• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Quantification of the variation in percentage identity for protein sequence alignments.蛋白质序列比对中百分比一致性变化的量化。
BMC Bioinformatics. 2006 Sep 19;7:415. doi: 10.1186/1471-2105-7-415.
2
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.OXBench:一种用于评估蛋白质多序列比对准确性的基准。
BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.
3
Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score.Fr-TM-align:一种基于片段比对和TM分数的新型蛋白质结构比对方法。
BMC Bioinformatics. 2008 Dec 12;9:531. doi: 10.1186/1471-2105-9-531.
4
Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.利用同义蛋白质词的评估函数提高一致性比对器的比对质量。
PLoS One. 2011;6(12):e27872. doi: 10.1371/journal.pone.0027872. Epub 2011 Dec 2.
5
Detailed protein sequence alignment based on Spectral Similarity Score (SSS).基于光谱相似性评分(SSS)的详细蛋白质序列比对。
BMC Bioinformatics. 2005 Apr 23;6:105. doi: 10.1186/1471-2105-6-105.
6
AL2CO: calculation of positional conservation in a protein sequence alignment.AL2CO:蛋白质序列比对中位置保守性的计算
Bioinformatics. 2001 Aug;17(8):700-12. doi: 10.1093/bioinformatics/17.8.700.
7
From analysis of protein structural alignments toward a novel approach to align protein sequences.从蛋白质结构比对分析到一种比对蛋白质序列的新方法。
Proteins. 2004 Feb 15;54(3):569-82. doi: 10.1002/prot.10503.
8
Improving the quality of twilight-zone alignments.提高暮光区对准的质量。
Protein Sci. 2000 Aug;9(8):1487-96. doi: 10.1110/ps.9.8.1487.
9
Adjusting scoring matrices to correct overextended alignments.调整评分矩阵以纠正过度延伸的比对。
Bioinformatics. 2013 Dec 1;29(23):3007-13. doi: 10.1093/bioinformatics/btt517. Epub 2013 Aug 31.
10
NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities.NdPASA:一种整合了邻域依赖氨基酸倾向的新型双序列蛋白质序列比对算法。
Proteins. 2005 Feb 15;58(3):628-37. doi: 10.1002/prot.20359.

引用本文的文献

1
Understanding species-specific and conserved RNA-protein interactions in vivo and in vitro.在体内和体外理解物种特异性和保守的 RNA-蛋白质相互作用。
Nat Commun. 2024 Sep 27;15(1):8400. doi: 10.1038/s41467-024-52231-7.
2
uncovers hundreds of novel human (and other) exons though comparative analysis of proteins.通过对蛋白质的比较分析发现了数百个新的人类(及其他)外显子。
bioRxiv. 2024 May 6:2024.05.05.592595. doi: 10.1101/2024.05.05.592595.
3
Understanding species-specific and conserved RNA-protein interactions and .了解物种特异性和保守的RNA-蛋白质相互作用以及……(原文不完整)
bioRxiv. 2024 Jan 30:2024.01.29.577729. doi: 10.1101/2024.01.29.577729.
4
Accurate and fast graph-based pangenome annotation and clustering with ggCaller.使用 ggCaller 实现基于图的精确快速泛基因组注释和聚类。
Genome Res. 2023 Sep;33(9):1622-1637. doi: 10.1101/gr.277733.123. Epub 2023 Aug 24.
5
A Novel CCK Receptor GPR173 Mediates Potentiation of GABAergic Inhibition.一种新型胆囊收缩素受体 GPR173 介导 GABA 能抑制的增效作用。
J Neurosci. 2023 Mar 29;43(13):2305-2325. doi: 10.1523/JNEUROSCI.2035-22.2023. Epub 2023 Feb 22.
6
Multi-task learning with a natural metric for quantitative structure activity relationship learning.用于定量构效关系学习的具有自然度量的多任务学习
J Cheminform. 2019 Nov 12;11(1):68. doi: 10.1186/s13321-019-0392-1.
7
A Division of Labor in the Recruitment and Topological Organization of a Bacterial Morphogenic Complex.细菌形态发生复合物的募集和拓扑组织中的分工。
Curr Biol. 2020 Oct 19;30(20):3908-3922.e4. doi: 10.1016/j.cub.2020.07.063. Epub 2020 Aug 13.
8
Segmentation and Comparative Modeling in an 8.6-Å Cryo-EM Map of the Singapore Grouper Iridovirus.分段与比较建模在新加坡石斑鱼虹彩病毒的 8.6Å 冷冻电镜图谱中。
Structure. 2019 Oct 1;27(10):1561-1569.e4. doi: 10.1016/j.str.2019.08.002. Epub 2019 Aug 22.
9
SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment.SubVis:一个用于探索多个替换矩阵对成对序列比对影响的交互式R包。
PeerJ. 2017 Jun 27;5:e3492. doi: 10.7717/peerj.3492. eCollection 2017.
10
Using intron position conservation for homology-based gene prediction.利用内含子位置保守性进行基于同源性的基因预测。
Nucleic Acids Res. 2016 May 19;44(9):e89. doi: 10.1093/nar/gkw092. Epub 2016 Feb 17.

本文引用的文献

1
Percent sequence identity; the need to be explicit.
Structure. 2004 May;12(5):737-8. doi: 10.1016/j.str.2004.04.001.
2
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.OXBench:一种用于评估蛋白质多序列比对准确性的基准。
BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.
3
Estimation of P-values for global alignments of protein sequences.蛋白质序列全局比对的P值估计。
Bioinformatics. 2001 Dec;17(12):1158-67. doi: 10.1093/bioinformatics/17.12.1158.
4
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.CLUSTAL W:通过序列加权、位置特异性空位罚分和权重矩阵选择提高渐进多序列比对的灵敏度。
Nucleic Acids Res. 1994 Nov 11;22(22):4673-80. doi: 10.1093/nar/22.22.4673.
5
Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility.在具有相似折叠结构的蛋白质中,结构特征可能是不保守的。对侧链与侧链接触、二级结构和可及性进行分析。
J Mol Biol. 1994 Dec 2;244(3):332-50. doi: 10.1006/jmbi.1994.1733.
6
An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited.蛋白质序列比对中氨基酸交换矩阵的评估:重温模糊区域
J Mol Biol. 1995 Jun 16;249(4):816-31. doi: 10.1006/jmbi.1995.0340.
7
Comparison of methods for searching protein sequence databases.蛋白质序列数据库搜索方法的比较。
Protein Sci. 1995 Jun;4(6):1145-60. doi: 10.1002/pro.5560040613.
8
Similar amino acid sequences: chance or common ancestry?相似的氨基酸序列:偶然因素还是共同祖先?
Science. 1981 Oct 9;214(4517):149-59. doi: 10.1126/science.7280687.
9
A general method applicable to the search for similarities in the amino acid sequence of two proteins.一种适用于寻找两种蛋白质氨基酸序列相似性的通用方法。
J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.
10
Evaluation and improvements in the automatic alignment of protein sequences.蛋白质序列自动比对的评估与改进
Protein Eng. 1987 Feb-Mar;1(2):89-94. doi: 10.1093/protein/1.2.89.

蛋白质序列比对中百分比一致性变化的量化。

Quantification of the variation in percentage identity for protein sequence alignments.

作者信息

Raghava G P S, Barton Geoffrey J

机构信息

School of Life Sciences Research, University of Dundee, Dow Street, Dundee, DD1 5EH, Scotland, UK.

出版信息

BMC Bioinformatics. 2006 Sep 19;7:415. doi: 10.1186/1471-2105-7-415.

DOI:10.1186/1471-2105-7-415
PMID:16984632
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1592310/
Abstract

BACKGROUND

Percentage Identity (PID) is frequently quoted in discussion of sequence alignments since it appears simple and easy to understand. However, although there are several different ways to calculate percentage identity and each may yield a different result for the same alignment, the method of calculation is rarely reported. Accordingly, quantification of the variation in PID caused by the different calculations would help in interpreting PID values in the literature. In this study, the variation in PID was quantified systematically on a reference set of 1028 alignments generated by comparison of the protein three-dimensional structures. Since the alignment algorithm may also affect the range of PID, this study also considered the effect of algorithm, and the combination of algorithm and PID method.

RESULTS

The maximum variation in PID due to the calculation method was 11.5% while the effect of alignment algorithm on PID was up to 14.6% across three popular alignment methods. The combined effect of alignment algorithm and PID calculation gave a variation of up to 22% on the test data, with an average of 5.3% +/- 2.8% for sequence pairs with < 30% identity. In order to see which PID method was most highly correlated with structural similarity, four different PID calculations were compared to similarity scores (Sc) from the comparison of the corresponding protein three-dimensional structures. The highest correlation coefficient for a PID calculation was 0.80. In contrast, the more sophisticated Z-score calculated by reference to randomized sequences gave a correlation coefficient of 0.84.

CONCLUSION

Although it is well known amongst expert sequence analysts that PID is a poor score for discriminating between protein sequences, the apparent simplicity of the percentage identity score encourages its widespread use in establishing cutoffs for structural similarity. This paper illustrates that not only is PID a poor measure of sequence similarity when compared to the Z-score, but that there is also a large uncertainty in reported PID values. Since better alternatives to PID exist to quantify sequence similarity, these should be quoted where possible in preference to PID. The findings presented here should prove helpful to those new to sequence analysis, and in warning those who seek to interpret the value of a PID reported in the literature.

摘要

背景

在序列比对的讨论中,经常会提到百分比一致性(PID),因为它看起来简单易懂。然而,尽管有几种不同的方法来计算百分比一致性,并且对于相同的比对,每种方法可能会产生不同的结果,但计算方法却很少被报道。因此,量化由不同计算引起的PID变化将有助于解释文献中的PID值。在本研究中,我们对通过比较蛋白质三维结构生成的1028个比对的参考集系统地量化了PID的变化。由于比对算法也可能影响PID的范围,本研究还考虑了算法的影响以及算法与PID方法的组合。

结果

由于计算方法导致的PID最大变化为11.5%,而在三种常用的比对方法中,比对算法对PID的影响高达14.6%。比对算法和PID计算的综合影响在测试数据上产生了高达22%的变化,对于同一性小于30%的序列对,平均为5.3%±2.8%。为了了解哪种PID方法与结构相似性相关性最高,我们将四种不同的PID计算与相应蛋白质三维结构比较的相似性得分(Sc)进行了比较。PID计算的最高相关系数为0.80。相比之下,通过参考随机序列计算的更复杂的Z分数的相关系数为0.84。

结论

尽管在专业序列分析人员中众所周知,PID在区分蛋白质序列方面是一个较差的得分,但百分比一致性得分表面上的简单性促使其在建立结构相似性的阈值时被广泛使用。本文表明,与Z分数相比,PID不仅是序列相似性的一个较差的度量,而且报告的PID值也存在很大的不确定性。由于存在比PID更好的量化序列相似性的替代方法,因此在可能的情况下应优先引用这些方法而不是PID。此处呈现的研究结果应对序列分析新手有所帮助,并警示那些试图解释文献中报告的PID值的人。