• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

结合频率谱中提取的进化信息与基于序列的核函数进行蛋白质远程同源检测。

Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection.

机构信息

School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Shanghai Key Laboratory of Intelligent Information Processing, Shanghai 200433, China, Gordon Life Science Institute, Belmont, MA 02478, USA, School of Computer, Shenyang Aerospace University, Shenyang, Liaoning, China, School of Computer Science, Fudan University, Shanghai 200433, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia.

出版信息

Bioinformatics. 2014 Feb 15;30(4):472-9. doi: 10.1093/bioinformatics/btt709. Epub 2013 Dec 5.

DOI:10.1093/bioinformatics/btt709
PMID:24318998
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7537947/
Abstract

MOTIVATION

Owing to its importance in both basic research (such as molecular evolution and protein attribute prediction) and practical application (such as timely modeling the 3D structures of proteins targeted for drug development), protein remote homology detection has attracted a great deal of interest. It is intriguing to note that the profile-based approach is promising and holds high potential in this regard. To further improve protein remote homology detection, a key step is how to find an optimal means to extract the evolutionary information into the profiles.

RESULTS

Here, we propose a novel approach, the so-called profile-based protein representation, to extract the evolutionary information via the frequency profiles. The latter can be calculated from the multiple sequence alignments generated by PSI-BLAST. Three top performing sequence-based kernels (SVM-Ngram, SVM-pairwise and SVM-LA) were combined with the profile-based protein representation. Various tests were conducted on a SCOP benchmark dataset that contains 54 families and 23 superfamilies. The results showed that the new approach is promising, and can obviously improve the performance of the three kernels. Furthermore, our approach can also provide useful insights for studying the features of proteins in various families. It has not escaped our notice that the current approach can be easily combined with the existing sequence-based methods so as to improve their performance as well.

AVAILABILITY AND IMPLEMENTATION

For users' convenience, the source code of generating the profile-based proteins and the multiple kernel learning was also provided at http://bioinformatics.hitsz.edu.cn/main/~binliu/remote/

摘要

动机

由于蛋白质远程同源检测在基础研究(如分子进化和蛋白质属性预测)和实际应用(如及时为药物开发的目标蛋白质建模 3D 结构)中都非常重要,因此引起了广泛关注。有趣的是,基于轮廓的方法在这方面很有前途,具有很大的潜力。为了进一步提高蛋白质远程同源检测的性能,关键步骤是如何找到一种最佳的方法将进化信息提取到轮廓中。

结果

在这里,我们提出了一种新的方法,即基于轮廓的蛋白质表示方法,通过频率轮廓来提取进化信息。后者可以从 PSI-BLAST 生成的多重序列比对中计算出来。我们将三种表现最佳的基于序列的核函数(SVM-Ngram、SVM-pairwise 和 SVM-LA)与基于轮廓的蛋白质表示方法相结合。在包含 54 个家族和 23 个超家族的 SCOP 基准数据集上进行了各种测试。结果表明,新方法很有前途,可以明显提高这三种核函数的性能。此外,我们的方法还可以为研究各种家族中蛋白质的特征提供有用的见解。我们注意到,当前的方法可以很容易地与现有的基于序列的方法结合使用,以提高它们的性能。

可用性和实现

为了方便用户,还在 http://bioinformatics.hitsz.edu.cn/main/~binliu/remote/ 提供了生成基于轮廓的蛋白质和多内核学习的源代码。

相似文献

1
Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection.结合频率谱中提取的进化信息与基于序列的核函数进行蛋白质远程同源检测。
Bioinformatics. 2014 Feb 15;30(4):472-9. doi: 10.1093/bioinformatics/btt709. Epub 2013 Dec 5.
2
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
3
Using distances between Top-n-gram and residue pairs for protein remote homology detection.使用 Top-n-gram 与残基对之间的距离进行蛋白质远程同源检测。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2105-15-S2-S3. Epub 2014 Jan 24.
4
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.
5
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
Proc IEEE Comput Syst Bioinform Conf. 2004:152-60. doi: 10.1109/csb.2004.1332428.
6
Protein homology detection using string alignment kernels.使用字符串比对核进行蛋白质同源性检测。
Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.
7
Application of latent semantic analysis to protein remote homology detection.潜在语义分析在蛋白质远程同源性检测中的应用。
Bioinformatics. 2006 Feb 1;22(3):285-90. doi: 10.1093/bioinformatics/bti801. Epub 2005 Nov 29.
8
dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation.dRHP-PseRA:基于轮廓的伪蛋白质序列和排序聚合检测远程同源蛋白质。
Sci Rep. 2016 Sep 1;6:32333. doi: 10.1038/srep32333.
9
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.SVM-HUSTLE——一种用于成对蛋白质远程同源性检测的迭代半监督机器学习方法。
Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.
10
Mismatch string kernels for discriminative protein classification.用于判别式蛋白质分类的错配字符串核
Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.

引用本文的文献

1
FusionEncoder: identification of intrinsically disordered regions based on multi-feature fusion.融合编码器:基于多特征融合的内在无序区域识别
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf362.
2
DeePhafier: a phage lifestyle classifier using a multilayer self-attention neural network combining protein information.DeePhafier:一种使用结合蛋白质信息的多层自注意力神经网络的噬菌体生活方式分类器。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae377.
3
Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design.残基间接触对于理解蛋白质折叠与解折叠速率、远程同源性及药物设计的重要性。
Mol Biotechnol. 2025 Mar;67(3):862-884. doi: 10.1007/s12033-024-01119-4. Epub 2024 Mar 18.
4
Sigma70Pred: A highly accurate method for predicting sigma70 promoter in K-12 strains.Sigma70Pred:一种预测K-12菌株中sigma70启动子的高精度方法。
Front Microbiol. 2022 Nov 14;13:1042127. doi: 10.3389/fmicb.2022.1042127. eCollection 2022.
5
Comparative analysis and prediction of nucleosome positioning using integrative feature representation and machine learning algorithms.基于综合特征表示和机器学习算法的核小体定位的比较分析和预测。
BMC Bioinformatics. 2021 Jun 2;22(Suppl 6):129. doi: 10.1186/s12859-021-04006-w.
6
A novel riboswitch classification based on imbalanced sequences achieved by machine learning.基于机器学习实现的不平衡序列的新型核糖体开关分类。
PLoS Comput Biol. 2020 Jul 20;16(7):e1007760. doi: 10.1371/journal.pcbi.1007760. eCollection 2020 Jul.
7
HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection.HMMPred:基于 HMM 轮廓和 XGBoost 特征选择的 DNA 结合蛋白精确预测。
Comput Math Methods Med. 2020 Mar 28;2020:1384749. doi: 10.1155/2020/1384749. eCollection 2020.
8
A Linear Regression Predictor for Identifying N-Methyladenosine Sites Using Frequent Gapped K-mer Pattern.一种使用频繁间隔k-mer模式识别N-甲基腺苷位点的线性回归预测器。
Mol Ther Nucleic Acids. 2019 Dec 6;18:673-680. doi: 10.1016/j.omtn.2019.10.001. Epub 2019 Oct 10.
9
Antimicrobial Resistance Prediction for Gram-Negative Bacteria via Game Theory-Based Feature Evaluation.基于博弈论的特征评估预测革兰氏阴性菌的抗菌药物耐药性。
Sci Rep. 2019 Oct 9;9(1):14487. doi: 10.1038/s41598-019-50686-z.
10
BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.BioSeq-Analysis2.0:一个基于机器学习方法的更新平台,用于在序列水平和残基水平上分析 DNA、RNA 和蛋白质序列。
Nucleic Acids Res. 2019 Nov 18;47(20):e127. doi: 10.1093/nar/gkz740.

本文引用的文献

1
Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.结合周氏伪氨基酸组成和基于轮廓的蛋白质表示法进行蛋白质远程同源性检测。
Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.
2
iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins.iSNO-AAPair:将氨基酸成对耦合纳入 PseAAC 中,用于预测蛋白质中的半胱氨酸 S-亚硝化位点。
PeerJ. 2013 Oct 3;1:e171. doi: 10.7717/peerj.171. eCollection 2013.
3
iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints.iCDI-PseFpt:利用 PseAAC 和分子指纹识别细胞网络中的通道药物相互作用。
J Theor Biol. 2013 Nov 21;337:71-9. doi: 10.1016/j.jtbi.2013.08.013. Epub 2013 Aug 26.
4
iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition.iHSP-PseRAAAC:利用伪简约氨基酸字母组成鉴定热休克蛋白家族。
Anal Biochem. 2013 Nov 1;442(1):118-25. doi: 10.1016/j.ab.2013.05.024. Epub 2013 Jun 10.
5
The pH-triggered conversion of the PrP(c) to PrP(sc.).pH 触发的 PrP(c)向 PrP(sc.)的转化。
Curr Top Med Chem. 2013;13(10):1152-63. doi: 10.2174/15680266113139990003.
6
A multilabel model based on Chou's pseudo-amino acid composition for identifying membrane proteins with both single and multiple functional types.基于 Chou 的伪氨基酸组成的多标签模型,用于识别具有单一和多种功能类型的膜蛋白。
J Membr Biol. 2013 Apr;246(4):327-34. doi: 10.1007/s00232-013-9536-9. Epub 2013 Apr 2.
7
Some remarks on predicting multi-label attributes in molecular biosystems.关于预测分子生物系统中多标签属性的一些评论。
Mol Biosyst. 2013 Jun;9(6):1092-100. doi: 10.1039/c3mb25555g. Epub 2013 Mar 28.
8
iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition.iRSpot-PseDNC:基于伪二核苷酸组成识别重组热点。
Nucleic Acids Res. 2013 Apr 1;41(6):e68. doi: 10.1093/nar/gks1450. Epub 2013 Jan 8.
9
Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition.通过将蛋白质拓扑结构、结构域、信号肽和物理化学性质纳入到周的伪氨基酸组成的通用形式中,来预测膜蛋白类型。
J Theor Biol. 2013 Feb 7;318:1-12. doi: 10.1016/j.jtbi.2012.10.033. Epub 2012 Nov 5.
10
Using amino acid physicochemical distance transformation for fast protein remote homology detection.利用氨基酸物化距离变换进行快速蛋白质远程同源检测。
PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28.