• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过度量嵌入检测蛋白质序列保守性。

Detecting protein sequence conservation via metric embeddings.

作者信息

Halperin E, Buhler J, Karp R, Krauthgamer R, Westover B

机构信息

International Computer Science Institute and Computer Science Division, University of California, Berkeley, CA 94720, USA.

出版信息

Bioinformatics. 2003;19 Suppl 1:i122-9. doi: 10.1093/bioinformatics/btg1016.

DOI:10.1093/bioinformatics/btg1016
PMID:12855448
Abstract

MOTIVATION

Comparing two protein databases is a fundamental task in biosequence annotation. Given two databases, one must find all pairs of proteins that align with high score under a biologically meaningful substitution score matrix, such as a BLOSUM matrix (Henikoff and Henikoff, 1992). Distance-based approaches to this problem map each peptide in the database to a point in a metric space, such that peptides aligning with higher scores are mapped to closer points. Many techniques exist to discover close pairs of points in a metric space efficiently, but the challenge in applying this work to proteomic comparison is to find a distance mapping that accurately encodes all the distinctions among residue pairs made by a proteomic score matrix. Buhler (2002) proposed one such mapping but found that it led to a relatively inefficient algorithm for protein-protein comparison.

RESULTS

This work proposes a new distance mapping for peptides under the BLOSUM matrices that permits more efficient similarity search. We first propose a new distance function on peptides derived from a given score matrix. We then show how to map peptides to bit vectors such that the distance between any two peptides is closely approximated by the Hamming distance (i.e. number of mismatches) between their corresponding bit vectors. We combine these two results with the LSH-ALL-PAIRS-SIM algorithm of Buhler (2002) to produce an improved distance-based algorithm for proteomic comparison. An initial implementation of the improved algorithm exhibits sensitivity within 5% of that of the original LSH-ALL-PAIRS-SIM, while running up to eight times faster.

摘要

动机

比较两个蛋白质数据库是生物序列注释中的一项基本任务。给定两个数据库,必须在具有生物学意义的替换得分矩阵(例如BLOSUM矩阵(亨尼科夫和亨尼科夫,1992年))下找到所有具有高分比对的蛋白质对。基于距离的解决此问题的方法将数据库中的每个肽映射到度量空间中的一个点,使得比对得分更高的肽被映射到更近的点。存在许多技术可以有效地发现度量空间中的近点对,但将这项工作应用于蛋白质组比较的挑战在于找到一种距离映射,该映射能准确编码蛋白质组得分矩阵对残基对所做的所有区分。布勒(2002年)提出了一种这样的映射,但发现它导致了一种相对低效的蛋白质-蛋白质比较算法。

结果

这项工作提出了一种在BLOSUM矩阵下用于肽的新距离映射,允许进行更有效的相似性搜索。我们首先提出一种基于给定得分矩阵的肽的新距离函数。然后我们展示如何将肽映射到位向量,使得任意两个肽之间的距离由它们相应位向量之间的汉明距离(即不匹配数)紧密近似。我们将这两个结果与布勒(2002年)的LSH - ALL - PAIRS - SIM算法相结合,以产生一种改进的基于距离的蛋白质组比较算法。改进算法的初始实现显示出的灵敏度在原始LSH - ALL - PAIRS - SIM的5%以内,同时运行速度快达八倍。

相似文献

1
Detecting protein sequence conservation via metric embeddings.通过度量嵌入检测蛋白质序列保守性。
Bioinformatics. 2003;19 Suppl 1:i122-9. doi: 10.1093/bioinformatics/btg1016.
2
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.使用进化速率结合氨基酸替换矩阵进行稳健的序列比对。
BMC Bioinformatics. 2015 Aug 14;16:255. doi: 10.1186/s12859-015-0688-8.
3
Provably sensitive indexing strategies for biosequence similarity search.用于生物序列相似性搜索的可证明敏感索引策略。
J Comput Biol. 2003;10(3-4):399-417. doi: 10.1089/10665270360688093.
4
Relation between weight matrix and substitution matrix: motif search by similarity.权重矩阵与替换矩阵之间的关系:基于相似性的基序搜索。
Bioinformatics. 2005 Apr 1;21(7):938-43. doi: 10.1093/bioinformatics/bti090. Epub 2004 Oct 28.
5
Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins.疏水性氨基酸的周期性分布有助于定义基本构建模块,从而比对远缘相关的蛋白质。
Proteins. 2007 May 15;67(3):695-708. doi: 10.1002/prot.21319.
6
Optimizing amino acid substitution matrices with a local alignment kernel.使用局部比对核优化氨基酸替换矩阵。
BMC Bioinformatics. 2006 May 5;7:246. doi: 10.1186/1471-2105-7-246.
7
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.
8
A metric model of amino acid substitution.氨基酸取代的度量模型。
Bioinformatics. 2004 May 22;20(8):1214-21. doi: 10.1093/bioinformatics/bth065. Epub 2004 Feb 10.
9
Inferring protein interactions from phylogenetic distance matrices.从系统发育距离矩阵推断蛋白质相互作用。
Bioinformatics. 2003 Nov 1;19(16):2039-45. doi: 10.1093/bioinformatics/btg278.
10
Addressing inaccuracies in BLOSUM computation improves homology search performance.解决BLOSUM计算中的不准确问题可提高同源性搜索性能。
BMC Bioinformatics. 2016 Apr 27;17:189. doi: 10.1186/s12859-016-1060-3.

引用本文的文献

1
LRRpredictor-A New LRR Motif Detection Method for Irregular Motifs of Plant NLR Proteins Using an Ensemble of Classifiers.LRRpredictor-一种使用集成分类器的植物 NLR 蛋白不规则基序新 LRR 基序检测方法。
Genes (Basel). 2020 Mar 8;11(3):286. doi: 10.3390/genes11030286.
2
Ab initio detection of fuzzy amino acid tandem repeats in protein sequences.从头开始检测蛋白质序列中的模糊氨基酸串联重复。
BMC Bioinformatics. 2012 Mar 21;13 Suppl 3(Suppl 3):S8. doi: 10.1186/1471-2105-13-S3-S8.