• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质序列数据库搜索中的检索准确性、统计显著性和组成相似性。

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.

作者信息

Yu Yi-Kuo, Gertz E Michael, Agarwala Richa, Schäffer Alejandro A, Altschul Stephen F

机构信息

National Center for Biotechnology Information, National Library of Medicine, NIH, DHHS, Bethesda, MD 20894, USA.

出版信息

Nucleic Acids Res. 2006;34(20):5966-73. doi: 10.1093/nar/gkl731. Epub 2006 Oct 26.

DOI:10.1093/nar/gkl731
PMID:17068079
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1635310/
Abstract

Protein sequence database search programs may be evaluated both for their retrieval accuracy--the ability to separate meaningful from chance similarities--and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.

摘要

蛋白质序列数据库搜索程序可以从检索准确性(即区分有意义的相似性和随机相似性的能力)以及对所报告比对的统计评估准确性这两方面进行评估。然而,提高统计准确性的方法可能会通过舍弃序列相关性的组成证据而降低检索准确性。通过将比对和组成相似性这两个基本独立的度量合并为一个统一的序列相似性度量,可以保留这一证据。对BLAST蛋白质数据库搜索程序的一个版本进行修改,使其采用这种新度量,在基于SCOP的测试集ASTRAL上,该版本在检索准确性和统计准确性方面均优于基线程序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e79/1694031/f3d30919cab3/gkl731f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e79/1694031/61fef4478b68/gkl731f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e79/1694031/55f30ae72522/gkl731f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e79/1694031/58ee54727859/gkl731f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e79/1694031/f3d30919cab3/gkl731f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e79/1694031/61fef4478b68/gkl731f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e79/1694031/55f30ae72522/gkl731f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e79/1694031/58ee54727859/gkl731f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e79/1694031/f3d30919cab3/gkl731f4.jpg

相似文献

1
Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.蛋白质序列数据库搜索中的检索准确性、统计显著性和组成相似性。
Nucleic Acids Res. 2006;34(20):5966-73. doi: 10.1093/nar/gkl731. Epub 2006 Oct 26.
2
Accuracy of structure-based sequence alignment of automatic methods.自动方法的基于结构的序列比对准确性。
BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355.
3
A high level interface to SCOP and ASTRAL implemented in python.一个用Python实现的与SCOP和ASTRAL的高级接口。
BMC Bioinformatics. 2006 Jan 10;7:10. doi: 10.1186/1471-2105-7-10.
4
Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST.基于组成的统计和翻译后的核苷酸搜索:改进BLAST的TBLASTN模块
BMC Biol. 2006 Dec 7;4:41. doi: 10.1186/1741-7007-4-41.
5
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.SS-Wrapper:用于在Linux集群上进行相似性搜索的一组包装应用程序。
BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.
6
DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.DIALIGN-T:一种改进的基于片段的多序列比对算法。
BMC Bioinformatics. 2005 Mar 22;6:66. doi: 10.1186/1471-2105-6-66.
7
Recent Hits Acquired by BLAST (ReHAB): a tool to identify new hits in sequence similarity searches.通过BLAST获取的近期命中结果(ReHAB):一种在序列相似性搜索中识别新命中结果的工具。
BMC Bioinformatics. 2005 Feb 8;6:23. doi: 10.1186/1471-2105-6-23.
8
PSIBLAST_PairwiseStatSig: reordering PSI-BLAST hits using pairwise statistical significance.PSI-BLAST成对统计显著性:使用成对统计显著性对PSI-BLAST命中结果进行重新排序。
Bioinformatics. 2009 Apr 15;25(8):1082-3. doi: 10.1093/bioinformatics/btp089. Epub 2009 Feb 27.
9
S4: structure-based sequence alignments of SCOP superfamilies.S4:SCOP超家族基于结构的序列比对。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D219-22. doi: 10.1093/nar/gki043.
10
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.OXBench:一种用于评估蛋白质多序列比对准确性的基准。
BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.

引用本文的文献

1
Robust Accurate Identification and Biomass Estimates of Microorganisms via Tandem Mass Spectrometry.通过串联质谱法对微生物进行稳健准确的鉴定和生物量估计。
J Am Soc Mass Spectrom. 2020 Jan 2;31(1):85-102. doi: 10.1021/jasms.9b00035. Epub 2019 Nov 20.
2
Estimating statistical significance of local protein profile-profile alignments.估计局部蛋白质图谱-图谱比对的统计显著性。
BMC Bioinformatics. 2019 Aug 13;20(1):419. doi: 10.1186/s12859-019-2913-3.
3
MultiDomainBenchmark: a multi-domain query and subject database suite.多领域基准测试:一个多领域查询和主题数据库套件。

本文引用的文献

1
Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching.使用受试者工作特征(ROC)分析来评估序列匹配。
Comput Chem. 1996 Mar;20(1):25-33. doi: 10.1016/s0097-8485(96)80004-0.
2
Database resources of the National Center for Biotechnology Information.美国国立生物技术信息中心的数据库资源。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D173-80. doi: 10.1093/nar/gkj158.
3
Paircoil2: improved prediction of coiled coils from sequence.Paircoil2:基于序列对卷曲螺旋的预测能力提升
BMC Bioinformatics. 2019 Feb 14;20(1):77. doi: 10.1186/s12859-019-2660-5.
4
Statistical investigations of protein residue direct couplings.蛋白质残基直接耦合的统计研究。
PLoS Comput Biol. 2018 Dec 31;14(12):e1006237. doi: 10.1371/journal.pcbi.1006237. eCollection 2018 Dec.
5
Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.基于高分辨串联质谱的高通量、高准确性微生物快速分类鉴定技术
J Am Soc Mass Spectrom. 2018 Aug;29(8):1721-1737. doi: 10.1007/s13361-018-1986-y. Epub 2018 Jun 5.
6
Query-seeded iterative sequence similarity searching improves selectivity 5-20-fold.查询种子迭代序列相似性搜索将选择性提高了5至20倍。
Nucleic Acids Res. 2017 Apr 20;45(7):e46. doi: 10.1093/nar/gkw1207.
7
Graphlet-based Characterization of Directed Networks.基于图元的有向网络特征描述
Sci Rep. 2016 Oct 13;6:35098. doi: 10.1038/srep35098.
8
Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution.基于极值分布的质谱肽段鉴定的置信度赋值
Bioinformatics. 2016 Sep 1;32(17):2642-9. doi: 10.1093/bioinformatics/btw225. Epub 2016 Apr 29.
9
Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance.通过具有精确统计显著性的高分辨率串联质谱法鉴定微生物
J Am Soc Mass Spectrom. 2016 Feb;27(2):194-210. doi: 10.1007/s13361-015-1271-2. Epub 2015 Oct 28.
10
Mass spectrometry-based protein identification with accurate statistical significance assignment.基于质谱的蛋白质鉴定及准确的统计显著性赋值。
Bioinformatics. 2015 Mar 1;31(5):699-706. doi: 10.1093/bioinformatics/btu717. Epub 2014 Oct 31.
Bioinformatics. 2006 Feb 1;22(3):356-8. doi: 10.1093/bioinformatics/bti797. Epub 2005 Nov 29.
4
Protein database searches using compositionally adjusted substitution matrices.使用成分调整替代矩阵进行蛋白质数据库搜索。
FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x.
5
Correcting BLAST e-values for low-complexity segments.针对低复杂度片段校正BLAST期望值。
J Comput Biol. 2005 Sep;12(7):980-1003. doi: 10.1089/cmb.2005.12.980.
6
Calibrating E-values for hidden Markov models using reverse-sequence null models.使用反向序列空模型校准隐马尔可夫模型的E值。
Bioinformatics. 2005 Nov 15;21(22):4107-15. doi: 10.1093/bioinformatics/bti629. Epub 2005 Aug 25.
7
The limits of protein sequence comparison?蛋白质序列比较的局限性?
Curr Opin Struct Biol. 2005 Jun;15(3):254-60. doi: 10.1016/j.sbi.2005.05.005.
8
A structure-based method for protein sequence alignment.一种基于结构的蛋白质序列比对方法。
Bioinformatics. 2005 Apr 15;21(8):1451-6. doi: 10.1093/bioinformatics/bti233. Epub 2004 Dec 21.
9
The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions.用于比较具有非标准组成的蛋白质的氨基酸替换矩阵的构建。
Bioinformatics. 2005 Apr 1;21(7):902-11. doi: 10.1093/bioinformatics/bti070. Epub 2004 Oct 27.
10
The compositional adjustment of amino acid substitution matrices.氨基酸替换矩阵的组成调整。
Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15688-93. doi: 10.1073/pnas.2533904100. Epub 2003 Dec 8.