• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

估计局部蛋白质图谱-图谱比对的统计显著性。

Estimating statistical significance of local protein profile-profile alignments.

机构信息

Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio al. 7, Vilnius, 10257, Lithuania.

出版信息

BMC Bioinformatics. 2019 Aug 13;20(1):419. doi: 10.1186/s12859-019-2913-3.

DOI:10.1186/s12859-019-2913-3
PMID:31409275
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6693267/
Abstract

BACKGROUND

Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the statistical significance of alignments, including profile-profile alignments, plays a key role in alignment-based homology search algorithms. Still, it is an open question as to what and whether one type of distribution governs profile-profile alignment score, especially when profile-profile substitution scores involve such terms as secondary structure predictions.

RESULTS

This study presents a methodology for estimating the statistical significance of this type of alignments. The methodology rests on a new algorithm developed for generating random profiles such that their alignment scores are distributed similarly to those obtained for real unrelated profiles. We show that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics. Implemented in the COMER software, the proposed methodology yielded an increase of up to 34.2% in the number of true positives and up to 61.8% in the number of high-quality alignments with respect to the previous version of the COMER method.

CONCLUSIONS

The more accurate estimation of statistical significance is implemented in the COMER method, which is now more sensitive and provides an increased rate of high-quality profile-profile alignments. The results of the present study also suggest directions for future research.

摘要

背景

通过轮廓描述的序列家族对齐为蛋白质之间的同源性提供了一种敏感的方法,在蛋白质进化、结构和功能研究中非常重要。在序列数据不断增长的情况下,评估对齐的统计显著性,包括轮廓轮廓对齐,在基于对齐的同源性搜索算法中起着关键作用。然而,什么和是否有一种分布控制轮廓轮廓对齐分数仍然是一个悬而未决的问题,特别是当轮廓轮廓替换分数涉及到二级结构预测等术语时。

结果

本研究提出了一种用于估计这种类型的对齐的统计显著性的方法。该方法基于一种新的算法,用于生成随机轮廓,使得它们的对齐分数与为真实不相关的轮廓获得的分数分布相似。我们表明,通过建立统计参数与与单个和成对轮廓特征相关的各种措施的依赖性,通过统计特征描述对齐来提高统计准确性和敏感性以及高质量对齐率。在 COMER 软件中实现的,所提出的方法与 COMER 方法的前一个版本相比,在真阳性的数量上增加了高达 34.2%,在高质量对齐的数量上增加了高达 61.8%。

结论

COMER 方法实现了更准确的统计显著性估计,现在更敏感,并提供了更高的高质量轮廓轮廓对齐率。本研究的结果还为未来的研究方向提供了建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/5ac7a369d61d/12859_2019_2913_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/97de1822427c/12859_2019_2913_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/45a861a76f0b/12859_2019_2913_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/91f67489ffec/12859_2019_2913_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/1071b448ee2d/12859_2019_2913_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/5ac7a369d61d/12859_2019_2913_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/97de1822427c/12859_2019_2913_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/45a861a76f0b/12859_2019_2913_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/91f67489ffec/12859_2019_2913_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/1071b448ee2d/12859_2019_2913_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56e1/6693267/5ac7a369d61d/12859_2019_2913_Fig5_HTML.jpg

相似文献

1
Estimating statistical significance of local protein profile-profile alignments.估计局部蛋白质图谱-图谱比对的统计显著性。
BMC Bioinformatics. 2019 Aug 13;20(1):419. doi: 10.1186/s12859-019-2913-3.
2
A low-complexity add-on score for protein remote homology search with COMER.COMER 辅助的蛋白质远程同源搜索的低复杂度附加评分。
Bioinformatics. 2018 Jun 15;34(12):2037-2045. doi: 10.1093/bioinformatics/bty048.
3
Improving the quality of twilight-zone alignments.提高暮光区对准的质量。
Protein Sci. 2000 Aug;9(8):1487-96. doi: 10.1110/ps.9.8.1487.
4
Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.蛋白质结构比对在用于结构预测的迭代隐马尔可夫模型协议中的应用。
BMC Bioinformatics. 2006 Sep 14;7:410. doi: 10.1186/1471-2105-7-410.
5
Scoring profile-to-profile sequence alignments.对图谱与图谱之间的序列进行比对评分。
Protein Sci. 2004 Jun;13(6):1612-26. doi: 10.1110/ps.03601504.
6
Using CLUSTAL for multiple sequence alignments.使用CLUSTAL进行多序列比对。
Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8.
7
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.OXBench:一种用于评估蛋白质多序列比对准确性的基准。
BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.
8
DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.DECIPHER:利用局部序列上下文来改进蛋白质多序列比对。
BMC Bioinformatics. 2015 Oct 6;16:322. doi: 10.1186/s12859-015-0749-z.
9
SFESA: a web server for pairwise alignment refinement by secondary structure shifts.SFESA:一个通过二级结构变化进行成对序列比对优化的网络服务器。
BMC Bioinformatics. 2015 Sep 3;16(1):282. doi: 10.1186/s12859-015-0711-0.
10
COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.COMPASS:一种用于比较多个蛋白质序列比对并评估统计学显著性的工具。
J Mol Biol. 2003 Feb 7;326(1):317-36. doi: 10.1016/s0022-2836(02)01371-2.

引用本文的文献

1
Classification of polyphenol oxidases shows ancient gene duplication leading to two distinct enzyme types.多酚氧化酶的分类显示古老的基因复制导致了两种不同的酶类型。
iScience. 2025 Jan 10;28(2):111771. doi: 10.1016/j.isci.2025.111771. eCollection 2025 Feb 21.
2
The COMER web server for protein analysis by homology.COMER 蛋白质同源分析网络服务器。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac807.
3
COMER2: GPU-accelerated sensitive and specific homology searches.COMER2:GPU 加速的敏感且特异的同源搜索。

本文引用的文献

1
PredMP: a web server for de novo prediction and visualization of membrane proteins.PredMP:一个用于从头预测和可视化膜蛋白的网络服务器。
Bioinformatics. 2019 Feb 15;35(4):691-693. doi: 10.1093/bioinformatics/bty684.
2
A low-complexity add-on score for protein remote homology search with COMER.COMER 辅助的蛋白质远程同源搜索的低复杂度附加评分。
Bioinformatics. 2018 Jun 15;34(12):2037-2045. doi: 10.1093/bioinformatics/bty048.
3
DEEPre: sequence-based enzyme EC number prediction by deep learning.DEEPre:基于深度学习的酶 EC 号序列预测。
Bioinformatics. 2020 Jun 1;36(11):3570-3572. doi: 10.1093/bioinformatics/btaa185.
Bioinformatics. 2018 Mar 1;34(5):760-769. doi: 10.1093/bioinformatics/btx680.
4
Combining dependent P-values with an empirical adaptation of Brown's method.将相关P值与布朗方法的经验性调整相结合。
Bioinformatics. 2016 Sep 1;32(17):i430-i436. doi: 10.1093/bioinformatics/btw438.
5
Bayesian nonparametrics in protein remote homology search.贝叶斯非参数方法在蛋白质远程同源性搜索中的应用
Bioinformatics. 2016 Sep 15;32(18):2744-52. doi: 10.1093/bioinformatics/btw213. Epub 2016 Apr 22.
6
The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库:迈向更可持续的未来。
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.
7
Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores.史密斯-沃特曼局部序列相似性得分统计分布的最大似然估计。
Bull Math Biol. 1992 Jan;54(1):59-75. doi: 10.1007/BF02458620.
8
UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.UniRef聚类:一种用于改进序列相似性搜索的全面且可扩展的替代方法。
Bioinformatics. 2015 Mar 15;31(6):926-32. doi: 10.1093/bioinformatics/btu739. Epub 2014 Nov 13.
9
SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures.SCOPe:蛋白质结构分类——扩展版,整合了 SCOP 和 ASTRAL 数据以及新结构的分类。
Nucleic Acids Res. 2014 Jan;42(Database issue):D304-9. doi: 10.1093/nar/gkt1240. Epub 2013 Dec 3.
10
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.HHblits:通过 HMM-HMM 比对进行快速迭代的蛋白质序列搜索。
Nat Methods. 2011 Dec 25;9(2):173-5. doi: 10.1038/nmeth.1818.