• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于离散余弦变换和动态时间规整的超快速全局同源检测。

Ultra-fast global homology detection with Discrete Cosine Transform and Dynamic Time Warping.

机构信息

Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium.

Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.

出版信息

Bioinformatics. 2018 Sep 15;34(18):3118-3125. doi: 10.1093/bioinformatics/bty309.

DOI:10.1093/bioinformatics/bty309
PMID:29684140
Abstract

MOTIVATION

Evolutionary information is crucial for the annotation of proteins in bioinformatics. The amount of retrieved homologs often correlates with the quality of predicted protein annotations related to structure or function. With a growing amount of sequences available, fast and reliable methods for homology detection are essential, as they have a direct impact on predicted protein annotations.

RESULTS

We developed a discriminative, alignment-free algorithm for homology detection with quasi-linear complexity, enabling theoretically much faster homology searches. To reach this goal, we convert the protein sequence into numeric biophysical representations. These are shrunk to a fixed length using a novel vector quantization method which uses a Discrete Cosine Transform compression. We then compute, for each compressed representation, similarity scores between proteins with the Dynamic Time Warping algorithm and we feed them into a Random Forest. The WARP performances are comparable with state of the art methods.

AVAILABILITY AND IMPLEMENTATION

The method is available at http://ibsquare.be/warp.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

进化信息对于生物信息学中蛋白质的注释至关重要。检索到的同源物的数量通常与预测的与结构或功能相关的蛋白质注释的质量相关。随着可用序列数量的增加,快速可靠的同源性检测方法至关重要,因为它们直接影响预测的蛋白质注释。

结果

我们开发了一种具有准线性复杂度的判别、无对齐算法,能够实现理论上更快的同源性搜索。为了达到这个目标,我们将蛋白质序列转换为数值生物物理表示。我们使用一种新颖的基于离散余弦变换压缩的向量量化方法将这些表示压缩到固定长度。然后,我们使用动态时间规整算法计算每个压缩表示之间的相似性得分,并将它们输入到随机森林中。WARP 的性能可与最先进的方法相媲美。

可用性和实现

该方法可在 http://ibsquare.be/warp 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
Ultra-fast global homology detection with Discrete Cosine Transform and Dynamic Time Warping.基于离散余弦变换和动态时间规整的超快速全局同源检测。
Bioinformatics. 2018 Sep 15;34(18):3118-3125. doi: 10.1093/bioinformatics/bty309.
2
HFSP: high speed homology-driven function annotation of proteins.HFSP:高速同源驱动的蛋白质功能注释。
Bioinformatics. 2018 Jul 1;34(13):i304-i312. doi: 10.1093/bioinformatics/bty262.
3
smallWig: parallel compression of RNA-seq WIG files.smallWig:RNA序列WIG文件的并行压缩
Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.
4
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
5
High efficiency referential genome compression algorithm.高效引用基因组压缩算法。
Bioinformatics. 2019 Jun 1;35(12):2058-2065. doi: 10.1093/bioinformatics/bty934.
6
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.PROMALS:用于实现远缘相关蛋白质准确多序列比对
Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31.
7
An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing.一种用于超长纳米孔测序中端到端映射的精确快速连续小波动态时间规整算法。
Bioinformatics. 2018 Sep 1;34(17):i722-i731. doi: 10.1093/bioinformatics/bty555.
8
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
9
Functional classification of CATH superfamilies: a domain-based approach for protein function annotation.CATH 超家族的功能分类:一种基于结构域的蛋白质功能注释方法。
Bioinformatics. 2015 Nov 1;31(21):3460-7. doi: 10.1093/bioinformatics/btv398. Epub 2015 Jul 2.
10
Statistical inference of protein structural alignments using information and compression.利用信息与压缩技术对蛋白质结构比对进行统计推断
Bioinformatics. 2017 Apr 1;33(7):1005-1013. doi: 10.1093/bioinformatics/btw757.

引用本文的文献

1
Biological Sequence Classification: A Review on Data and General Methods.生物序列分类:数据与通用方法综述
Research (Wash D C). 2022 Dec 19;2022:0011. doi: 10.34133/research.0011. eCollection 2022.
2
Protein domain embeddings for fast and accurate similarity search.蛋白质结构域嵌入用于快速准确的相似性搜索。
Genome Res. 2024 Oct 11;34(9):1434-1444. doi: 10.1101/gr.279127.124.
3
Improved global protein homolog detection with major gains in function identification.提高全局蛋白质同源物检测的功能识别能力。
Proc Natl Acad Sci U S A. 2023 Feb 28;120(9):e2211823120. doi: 10.1073/pnas.2211823120. Epub 2023 Feb 24.
4
ShiftCrypt: a web server to understand and biophysically align proteins through their NMR chemical shift values.ShiftCrypt:一个通过 NMR 化学位移值理解和生物物理对齐蛋白质的网络服务器。
Nucleic Acids Res. 2020 Jul 2;48(W1):W36-W40. doi: 10.1093/nar/gkaa391.
5
Insight into the protein solubility driving forces with neural attention.用神经注意力洞察蛋白质溶解度驱动力。
PLoS Comput Biol. 2020 Apr 30;16(4):e1007722. doi: 10.1371/journal.pcbi.1007722. eCollection 2020 Apr.
6
COMER2: GPU-accelerated sensitive and specific homology searches.COMER2:GPU 加速的敏感且特异的同源搜索。
Bioinformatics. 2020 Jun 1;36(11):3570-3572. doi: 10.1093/bioinformatics/btaa185.
7
Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis.探索结合机器学习的生物物理倾向性尺度在蛋白质序列分析中的局限性。
Sci Rep. 2019 Nov 15;9(1):16932. doi: 10.1038/s41598-019-53324-w.