• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于寡聚体距离的远程同源性检测。

Remote homology detection based on oligomer distances.

作者信息

Lingner Thomas, Meinicke Peter

机构信息

Abteilung Bioinformatik, Institut für Mikrobiologie und Genetik, Georg-August-Universität Göttingen Goldschmidtstr. 1, 37077 Göttingen, Germany.

出版信息

Bioinformatics. 2006 Sep 15;22(18):2224-31. doi: 10.1093/bioinformatics/btl376. Epub 2006 Jul 12.

DOI:10.1093/bioinformatics/btl376
PMID:16837522
Abstract

MOTIVATION

Remote homology detection is among the most intensively researched problems in bioinformatics. Currently discriminative approaches, especially kernel-based methods, provide the most accurate results. However, kernel methods also show several drawbacks: in many cases prediction of new sequences is computationally expensive, often kernels lack an interpretable model for analysis of characteristic sequence features, and finally most approaches make use of so-called hyperparameters which complicate the application of methods across different datasets.

RESULTS

We introduce a feature vector representation for protein sequences based on distances between short oligomers. The corresponding feature space arises from distance histograms for any possible pair of K-mers. Our distance-based approach shows important advantages in terms of computational speed while on common test data the prediction performance is highly competitive with state-of-the-art methods for protein remote homology detection. Furthermore the learnt model can easily be analyzed in terms of discriminative features and in contrast to other methods our representation does not require any tuning of kernel hyperparameters.

AVAILABILITY

Normalized kernel matrices for the experimental setup can be downloaded at www.gobics.de/thomas. Matlab code for computing the kernel matrices is available upon request.

CONTACT

thomas@gobics.de, peter@gobics.de.

摘要

动机

远程同源性检测是生物信息学中研究最为深入的问题之一。目前,判别方法,尤其是基于核的方法,能提供最准确的结果。然而,核方法也存在一些缺点:在许多情况下,新序列的预测计算成本很高,核通常缺乏用于分析特征序列特征的可解释模型,最后,大多数方法使用所谓的超参数,这使得方法在不同数据集上的应用变得复杂。

结果

我们基于短寡聚体之间的距离引入了一种蛋白质序列的特征向量表示。相应的特征空间来自于任何可能的K-mer对的距离直方图。我们基于距离的方法在计算速度方面显示出重要优势,而在常见测试数据上,预测性能与蛋白质远程同源性检测的最先进方法相比具有很强的竞争力。此外,所学习的模型可以很容易地根据判别特征进行分析,并且与其他方法不同,我们的表示不需要对核超参数进行任何调整。

可用性

实验设置的归一化核矩阵可在www.gobics.de/thomas下载。计算核矩阵的Matlab代码可根据请求提供。

联系方式

thomas@gobics.de,peter@gobics.de。

相似文献

1
Remote homology detection based on oligomer distances.基于寡聚体距离的远程同源性检测。
Bioinformatics. 2006 Sep 15;22(18):2224-31. doi: 10.1093/bioinformatics/btl376. Epub 2006 Jul 12.
2
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.SVM-HUSTLE——一种用于成对蛋白质远程同源性检测的迭代半监督机器学习方法。
Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.
3
Profile-based direct kernels for remote homology detection and fold recognition.用于远程同源性检测和折叠识别的基于轮廓的直接内核。
Bioinformatics. 2005 Dec 1;21(23):4239-47. doi: 10.1093/bioinformatics/bti687. Epub 2005 Sep 27.
4
Support vector machines with profile-based kernels for remote protein homology detection.用于远程蛋白质同源性检测的基于轮廓核的支持向量机。
Genome Inform. 2004;15(2):191-200.
5
A structural alignment kernel for protein structures.一种用于蛋白质结构的结构比对核。
Bioinformatics. 2007 May 1;23(9):1090-8. doi: 10.1093/bioinformatics/btl642. Epub 2007 Jan 18.
6
Semi-supervised protein classification using cluster kernels.使用聚类核的半监督蛋白质分类
Bioinformatics. 2005 Aug 1;21(15):3241-7. doi: 10.1093/bioinformatics/bti497. Epub 2005 May 19.
7
Mismatch string kernels for discriminative protein classification.用于判别式蛋白质分类的错配字符串核
Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.
8
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.概率多类多核学习:用于蛋白质折叠识别和远程同源性检测
Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.
9
Application of latent semantic analysis to protein remote homology detection.潜在语义分析在蛋白质远程同源性检测中的应用。
Bioinformatics. 2006 Feb 1;22(3):285-90. doi: 10.1093/bioinformatics/bti801. Epub 2005 Nov 29.
10
Predicting protein structure classes from function predictions.从功能预测中预测蛋白质结构类别。
Bioinformatics. 2004 Mar 22;20(5):770-6. doi: 10.1093/bioinformatics/btg483. Epub 2004 Jan 29.

引用本文的文献

1
A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction.蛋白质家族预测的机器学习技术综述
Protein J. 2024 Apr;43(2):171-186. doi: 10.1007/s10930-024-10181-5. Epub 2024 Mar 1.
2
EnsembleFam: towards more accurate protein family prediction in the twilight zone.EnsembleFam:迈向更准确地预测模糊区域中的蛋白质家族
BMC Bioinformatics. 2022 Mar 14;23(1):90. doi: 10.1186/s12859-022-04626-w.
3
Graph Theory-Based Sequence Descriptors as Remote Homology Predictors.基于图论的序列描述符可作为远程同源性预测器。
Biomolecules. 2019 Dec 23;10(1):26. doi: 10.3390/biom10010026.
4
Alignment-free method for DNA sequence clustering using Fuzzy integral similarity.基于模糊积分相似度的无比对 DNA 序列聚类方法。
Sci Rep. 2019 Mar 6;9(1):3753. doi: 10.1038/s41598-019-40452-6.
5
DeepFam: deep learning based alignment-free method for protein family modeling and prediction.DeepFam:基于深度学习的蛋白质家族建模和预测的无对齐方法。
Bioinformatics. 2018 Jul 1;34(13):i254-i262. doi: 10.1093/bioinformatics/bty275.
6
Protein remote homology detection based on bidirectional long short-term memory.基于双向长短期记忆的蛋白质远程同源性检测
BMC Bioinformatics. 2017 Oct 10;18(1):443. doi: 10.1186/s12859-017-1842-2.
7
Genetic sequence-based prediction of long-range chromatin interactions suggests a potential role of short tandem repeat sequences in genome organization.基于遗传序列的长程染色质相互作用预测表明短串联重复序列在基因组组织中具有潜在作用。
BMC Bioinformatics. 2017 Apr 18;18(1):218. doi: 10.1186/s12859-017-1624-x.
8
Fast and accurate phylogeny reconstruction using filtered spaced-word matches.使用过滤后的间隔词匹配进行快速准确的系统发育重建。
Bioinformatics. 2017 Apr 1;33(7):971-979. doi: 10.1093/bioinformatics/btw776.
9
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.拉斯巴里:优化间隔种子用于数据库搜索、读段映射和无比对序列比较
PLoS Comput Biol. 2016 Oct 19;12(10):e1005107. doi: 10.1371/journal.pcbi.1005107. eCollection 2016 Oct.
10
Protein Remote Homology Detection Based on an Ensemble Learning Approach.基于集成学习方法的蛋白质远程同源性检测
Biomed Res Int. 2016;2016:5813645. doi: 10.1155/2016/5813645. Epub 2016 May 8.