• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用 Top-n-gram 与残基对之间的距离进行蛋白质远程同源检测。

Using distances between Top-n-gram and residue pairs for protein remote homology detection.

出版信息

BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2105-15-S2-S3. Epub 2014 Jan 24.

DOI:10.1186/1471-2105-15-S2-S3
PMID:24564580
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4015815/
Abstract

BACKGROUND

Protein remote homology detection is one of the central problems in bioinformatics, which is important for both basic research and practical application. Currently, discriminative methods based on Support Vector Machines (SVMs) achieve the state-of-the-art performance. Exploring feature vectors incorporating the position information of amino acids or other protein building blocks is a key step to improve the performance of the SVM-based methods.

RESULTS

Two new methods for protein remote homology detection were proposed, called SVM-DR and SVM-DT. SVM-DR is a sequence-based method, in which the feature vector representation for protein is based on the distances between residue pairs. SVM-DT is a profile-based method, which considers the distances between Top-n-gram pairs. Top-n-gram can be viewed as a profile-based building block of proteins, which is calculated from the frequency profiles. These two methods are position dependent approaches incorporating the sequence-order information of protein sequences. Various experiments were conducted on a benchmark dataset containing 54 families and 23 superfamilies. Experimental results showed that these two new methods are very promising. Compared with the position independent methods, the performance improvement is obvious. Furthermore, the proposed methods can also provide useful insights for studying the features of protein families.

CONCLUSION

The better performance of the proposed methods demonstrates that the position dependant approaches are efficient for protein remote homology detection. Another advantage of our methods arises from the explicit feature space representation, which can be used to analyze the characteristic features of protein families. The source code of SVM-DT and SVM-DR is available at http://bioinformatics.hitsz.edu.cn/DistanceSVM/index.jsp.

摘要

背景

蛋白质远程同源检测是生物信息学的核心问题之一,对于基础研究和实际应用都非常重要。目前,基于支持向量机(SVM)的判别方法取得了最新的性能。探索包含氨基酸或其他蛋白质构建块位置信息的特征向量是提高基于 SVM 方法性能的关键步骤。

结果

提出了两种新的蛋白质远程同源检测方法,称为 SVM-DR 和 SVM-DT。SVM-DR 是一种基于序列的方法,其中蛋白质的特征向量表示基于残基对之间的距离。SVM-DT 是一种基于轮廓的方法,考虑了 Top-n-gram 对之间的距离。Top-n-gram 可以看作是基于蛋白质频率轮廓计算的基于轮廓的蛋白质构建块。这两种方法都是依赖位置的方法,包含了蛋白质序列的顺序信息。在包含 54 个家族和 23 个超家族的基准数据集上进行了各种实验。实验结果表明,这两种新方法非常有前途。与独立于位置的方法相比,性能有明显提高。此外,所提出的方法还可以为研究蛋白质家族的特征提供有用的见解。

结论

所提出的方法的更好性能表明,依赖位置的方法对于蛋白质远程同源检测是有效的。我们的方法的另一个优势来自于显式特征空间表示,可用于分析蛋白质家族的特征特征。SVM-DT 和 SVM-DR 的源代码可在 http://bioinformatics.hitsz.edu.cn/DistanceSVM/index.jsp 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/f5c2d8be1059/1471-2105-15-S2-S3-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/2fa3d827cf47/1471-2105-15-S2-S3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/dba311018884/1471-2105-15-S2-S3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/d9a947cdbe9b/1471-2105-15-S2-S3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/d241b70fdb51/1471-2105-15-S2-S3-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/f5c2d8be1059/1471-2105-15-S2-S3-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/2fa3d827cf47/1471-2105-15-S2-S3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/dba311018884/1471-2105-15-S2-S3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/d9a947cdbe9b/1471-2105-15-S2-S3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/d241b70fdb51/1471-2105-15-S2-S3-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/443f/4015815/f5c2d8be1059/1471-2105-15-S2-S3-5.jpg

相似文献

1
Using distances between Top-n-gram and residue pairs for protein remote homology detection.使用 Top-n-gram 与残基对之间的距离进行蛋白质远程同源检测。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2105-15-S2-S3. Epub 2014 Jan 24.
2
Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection.结合频率谱中提取的进化信息与基于序列的核函数进行蛋白质远程同源检测。
Bioinformatics. 2014 Feb 15;30(4):472-9. doi: 10.1093/bioinformatics/btt709. Epub 2013 Dec 5.
3
Using amino acid physicochemical distance transformation for fast protein remote homology detection.利用氨基酸物化距离变换进行快速蛋白质远程同源检测。
PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28.
4
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
5
Application of latent semantic analysis to protein remote homology detection.潜在语义分析在蛋白质远程同源性检测中的应用。
Bioinformatics. 2006 Feb 1;22(3):285-90. doi: 10.1093/bioinformatics/bti801. Epub 2005 Nov 29.
6
Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties.利用递归定量分析和氨基酸理化性质进行远程蛋白质同源性检测。
J Theor Biol. 2008 May 7;252(1):145-54. doi: 10.1016/j.jtbi.2008.01.028. Epub 2008 Feb 7.
7
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.
8
A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.一种结合Top-n-grams和潜在语义分析的蛋白质远程同源性检测与折叠识别的判别方法。
BMC Bioinformatics. 2008 Dec 1;9:510. doi: 10.1186/1471-2105-9-510.
9
Protein homology detection using string alignment kernels.使用字符串比对核进行蛋白质同源性检测。
Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.
10
Efficient remote homology detection using local structure.利用局部结构进行高效的远程同源性检测。
Bioinformatics. 2003 Nov 22;19(17):2294-301. doi: 10.1093/bioinformatics/btg317.

引用本文的文献

1
MvAl-MFP: A Multi-Label Classification Method on the Functions of Peptides with Multi-View Active Learning.MvAl-MFP:一种基于多视图主动学习的肽功能多标签分类方法。
Curr Issues Mol Biol. 2025 Aug 6;47(8):628. doi: 10.3390/cimb47080628.
2
AAindexNC: Estimating the Physicochemical Properties of Non-Canonical Amino Acids, Including Those Derived from the PDB and PDBeChem Databank.AAindexNC:估算非标准氨基酸的物理化学性质,包括那些源自蛋白质数据库(PDB)和蛋白质数据银行化学数据库(PDBeChem)的非标准氨基酸。
Int J Mol Sci. 2024 Nov 22;25(23):12555. doi: 10.3390/ijms252312555.
3
A novel two-way rebalancing strategy for identifying carbonylation sites.

本文引用的文献

1
Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.结合周氏伪氨基酸组成和基于轮廓的蛋白质表示法进行蛋白质远程同源性检测。
Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.
2
Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection.结合频率谱中提取的进化信息与基于序列的核函数进行蛋白质远程同源检测。
Bioinformatics. 2014 Feb 15;30(4):472-9. doi: 10.1093/bioinformatics/btt709. Epub 2013 Dec 5.
3
An approach for identifying cytokines based on a novel ensemble classifier.
一种新型双向再平衡策略,用于鉴定羰基化位点。
BMC Bioinformatics. 2023 Nov 13;24(1):429. doi: 10.1186/s12859-023-05551-2.
4
Design of Protein Segments and Peptides for Binding to Protein Targets.用于与蛋白质靶标结合的蛋白质片段和肽的设计
Biodes Res. 2022 Apr 15;2022:9783197. doi: 10.34133/2022/9783197. eCollection 2022.
5
PreTP-2L: identification of therapeutic peptides and their types using two-layer ensemble learning framework.PreTP-2L:使用两层集成学习框架识别治疗性肽及其类型。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad125.
6
Collectively encoding protein properties enriches protein language models.整体编码蛋白质特性可以丰富蛋白质语言模型。
BMC Bioinformatics. 2022 Nov 8;23(1):467. doi: 10.1186/s12859-022-05031-z.
7
Prediction of Hormone-Binding Proteins Based on K-mer Feature Representation and Naive Bayes.基于K-mer特征表示和朴素贝叶斯的激素结合蛋白预测
Front Genet. 2021 Nov 23;12:797641. doi: 10.3389/fgene.2021.797641. eCollection 2021.
8
BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.BioSeq-BLM:一个基于生物语言模型分析 DNA、RNA 和蛋白质序列的平台。
Nucleic Acids Res. 2021 Dec 16;49(22):e129. doi: 10.1093/nar/gkab829.
9
CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques.CarSite-II:一种基于 K-均值相似性欠采样和合成少数类过采样技术的用于识别羰基化位点的集成分类算法。
BMC Bioinformatics. 2021 Apr 26;22(1):216. doi: 10.1186/s12859-021-04134-3.
10
EnACP: An Ensemble Learning Model for Identification of Anticancer Peptides.EnACP:一种用于鉴定抗癌肽的集成学习模型。
Front Genet. 2020 Jul 30;11:760. doi: 10.3389/fgene.2020.00760. eCollection 2020.
基于新型集成分类器的细胞因子识别方法。
Biomed Res Int. 2013;2013:686090. doi: 10.1155/2013/686090. Epub 2013 Aug 21.
4
Using amino acid physicochemical distance transformation for fast protein remote homology detection.利用氨基酸物化距离变换进行快速蛋白质远程同源检测。
PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28.
5
BioShell Threader: protein homology detection based on sequence profiles and secondary structure profiles.BioShell Threader:基于序列轮廓和二级结构轮廓的蛋白质同源性检测。
Nucleic Acids Res. 2012 Jul;40(Web Server issue):W257-62. doi: 10.1093/nar/gks555. Epub 2012 Jun 12.
6
FFAS server: novel features and applications.FFAS 服务器:新特性和应用。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W38-44. doi: 10.1093/nar/gkr441.
7
Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.使用两层支持向量机分类器进行远程蛋白质同源检测和折叠识别。
Comput Biol Med. 2011 Aug;41(8):687-99. doi: 10.1016/j.compbiomed.2011.06.004. Epub 2011 Jun 25.
8
Protein remote homology detection based on auto-cross covariance transformation.基于自交协方差变换的蛋白质远程同源检测。
Comput Biol Med. 2011 Aug;41(8):640-7. doi: 10.1016/j.compbiomed.2011.05.015. Epub 2011 Jun 12.
9
An improved profile-level domain linker propensity index for protein domain boundary prediction.一种用于蛋白质结构域边界预测的改进的轮廓水平结构域连接倾向指数。
Protein Pept Lett. 2011 Jan;18(1):7-16. doi: 10.2174/092986611794328717.
10
COMA server for protein distant homology search.蛋白质远程同源搜索的 COMA 服务器。
Bioinformatics. 2010 Aug 1;26(15):1905-6. doi: 10.1093/bioinformatics/btq306. Epub 2010 Jun 6.