• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

H-DROP:一种基于支持向量机的螺旋结构域连接子预测器,通过结合随机森林和逐步选择优化特征进行训练。

H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection.

作者信息

Ebina Teppei, Suzuki Ryosuke, Tsuji Ryotaro, Kuroda Yutaka

机构信息

Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan,

出版信息

J Comput Aided Mol Des. 2014 Aug;28(8):831-9. doi: 10.1007/s10822-014-9763-x. Epub 2014 Jun 26.

DOI:10.1007/s10822-014-9763-x
PMID:24965847
Abstract

Domain linker prediction is attracting much interest as it can help identifying novel domains suitable for high throughput proteomics analysis. Here, we report H-DROP, an SVM-based Helical Domain linker pRediction using OPtimal features. H-DROP is, to the best of our knowledge, the first predictor for specifically and effectively identifying helical linkers. This was made possible first because a large training dataset became available from IS-Dom, and second because we selected a small number of optimal features from a huge number of potential ones. The training helical linker dataset, which included 261 helical linkers, was constructed by detecting helical residues at the boundary regions of two independent structural domains listed in our previously reported IS-Dom dataset. 45 optimal feature candidates were selected from 3,000 features by random forest, which were further reduced to 26 optimal features by stepwise selection. The prediction sensitivity and precision of H-DROP were 35.2 and 38.8%, respectively. These values were over 10.7% higher than those of control methods including our previously developed DROP, which is a coil linker predictor, and PPRODO, which is trained with un-differentiated domain boundary sequences. Overall, these results indicated that helical linkers can be predicted from sequence information alone by using a strictly curated training data set for helical linkers and carefully selected set of optimal features. H-DROP is available at http://domserv.lab.tuat.ac.jp.

摘要

结构域连接子预测正吸引着众多关注,因为它有助于识别适用于高通量蛋白质组学分析的新型结构域。在此,我们报告了H-DROP,一种基于支持向量机(SVM)的利用最优特征进行螺旋结构域连接子预测的方法。据我们所知,H-DROP是首个专门且有效地识别螺旋连接子的预测工具。这之所以成为可能,首先是因为从IS-Dom获得了大量训练数据集,其次是因为我们从大量潜在特征中挑选出了少量最优特征。训练螺旋连接子数据集包含261个螺旋连接子,是通过在我们先前报道的IS-Dom数据集中列出的两个独立结构域的边界区域检测螺旋残基构建而成。通过随机森林从3000个特征中挑选出45个最优特征候选,再通过逐步选择将其进一步缩减至26个最优特征。H-DROP的预测灵敏度和精度分别为35.2%和38.8%。这些值比包括我们先前开发的DROP(一种卷曲连接子预测工具)和PPRODO(用未分化的结构域边界序列训练)在内的对照方法高出10.7%以上。总体而言,这些结果表明,通过使用经过严格筛选的螺旋连接子训练数据集和精心挑选的最优特征集,仅从序列信息就能预测螺旋连接子。可在http://domserv.lab.tuat.ac.jp获取H-DROP。

相似文献

1
H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection.H-DROP:一种基于支持向量机的螺旋结构域连接子预测器,通过结合随机森林和逐步选择优化特征进行训练。
J Comput Aided Mol Des. 2014 Aug;28(8):831-9. doi: 10.1007/s10822-014-9763-x. Epub 2014 Jun 26.
2
Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers.快速H-DROP:H-DROP的30倍加速版本,用于基于支持向量机的螺旋结构域连接子的交互式预测。
J Comput Aided Mol Des. 2017 Feb;31(2):237-244. doi: 10.1007/s10822-016-9999-8. Epub 2016 Dec 27.
3
DROP: an SVM domain linker predictor trained with optimal features selected by random forest.DROP:一种使用随机森林选择的最佳特征训练的 SVM 域链接器预测器。
Bioinformatics. 2011 Feb 15;27(4):487-94. doi: 10.1093/bioinformatics/btq700. Epub 2010 Dec 17.
4
Loop-length-dependent SVM prediction of domain linkers for high-throughput structural proteomics.用于高通量结构蛋白质组学的结构域连接子的环长依赖性支持向量机预测
Biopolymers. 2009;92(1):1-8. doi: 10.1002/bip.21105.
5
IS-Dom: a dataset of independent structural domains automatically delineated from protein structures.IS-Dom:一个从蛋白质结构中自动划分的独立结构域数据集。
J Comput Aided Mol Des. 2013 May;27(5):419-26. doi: 10.1007/s10822-013-9654-6. Epub 2013 May 29.
6
DomHR: accurately identifying domain boundaries in proteins using a hinge region strategy.使用铰链区策略准确识别蛋白质的结构域边界。
PLoS One. 2013 Apr 11;8(4):e60559. doi: 10.1371/journal.pone.0060559. Print 2013.
7
PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.PDP-CON:使用共识方法预测蛋白质序列中的结构域/连接子残基。
J Mol Model. 2016 Apr;22(4):72. doi: 10.1007/s00894-016-2933-0. Epub 2016 Mar 11.
8
Armadillo: domain boundary prediction by amino acid composition.犰狳:基于氨基酸组成的结构域边界预测
J Mol Biol. 2005 Jul 29;350(5):1061-73. doi: 10.1016/j.jmb.2005.05.037.
9
Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties.使用支持向量机结合选定的蛋白质序列和结构特性预测催化残基。
BMC Bioinformatics. 2006 Jun 21;7:312. doi: 10.1186/1471-2105-7-312.
10
Domain boundary prediction based on profile domain linker propensity index.基于序列轮廓结构域连接子倾向指数的结构域边界预测
Comput Biol Chem. 2006 Apr;30(2):127-33. doi: 10.1016/j.compbiolchem.2006.01.001. Epub 2006 Mar 13.

引用本文的文献

1
PssJ Is a Terminal Galactosyltransferase Involved in the Assembly of the Exopolysaccharide Subunit in bv. .PssJ 是一种末端半乳糖基转移酶,参与 bv. 外多糖亚基的组装。
Int J Mol Sci. 2020 Oct 20;21(20):7764. doi: 10.3390/ijms21207764.
2
ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly.ThreaDomEx:一个通过多线程和片段组装预测连续和不连续蛋白质结构域的统一平台。
Nucleic Acids Res. 2017 Jul 3;45(W1):W400-W407. doi: 10.1093/nar/gkx410.
3
Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers.

本文引用的文献

1
ThreaDom: extracting protein domain boundary information from multiple threading alignments.ThreaDom:从多重序列比对中提取蛋白质结构域边界信息。
Bioinformatics. 2013 Jul 1;29(13):i247-56. doi: 10.1093/bioinformatics/btt209.
2
IS-Dom: a dataset of independent structural domains automatically delineated from protein structures.IS-Dom:一个从蛋白质结构中自动划分的独立结构域数据集。
J Comput Aided Mol Des. 2013 May;27(5):419-26. doi: 10.1007/s10822-013-9654-6. Epub 2013 May 29.
3
FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model.
快速H-DROP:H-DROP的30倍加速版本,用于基于支持向量机的螺旋结构域连接子的交互式预测。
J Comput Aided Mol Des. 2017 Feb;31(2):237-244. doi: 10.1007/s10822-016-9999-8. Epub 2016 Dec 27.
4
A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.一种新型的特征提取方法,具有特征选择功能,可从不平衡数据中识别出高尔基驻留蛋白类型。
Int J Mol Sci. 2016 Feb 6;17(2):218. doi: 10.3390/ijms17020218.
FunSAV:使用两阶段随机森林模型预测单氨基酸变异的功能效应。
PLoS One. 2012;7(8):e43847. doi: 10.1371/journal.pone.0043847. Epub 2012 Aug 24.
4
SH3YL1 regulates dorsal ruffle formation by a novel phosphoinositide-binding domain.SH3YL1 通过一种新型的磷酯结合域调节背侧皱襞的形成。
J Cell Biol. 2011 May 30;193(5):901-16. doi: 10.1083/jcb.201012161.
5
Functional specialization in nucleotide sugar transporters occurred through differentiation of the gene cluster EamA (DUF6) before the radiation of Viridiplantae.核苷酸糖转运蛋白的功能特化是通过 EamA(DUF6)基因簇在植物辐射分化之前发生的。
BMC Evol Biol. 2011 May 12;11:123. doi: 10.1186/1471-2148-11-123.
6
DROP: an SVM domain linker predictor trained with optimal features selected by random forest.DROP:一种使用随机森林选择的最佳特征训练的 SVM 域链接器预测器。
Bioinformatics. 2011 Feb 15;27(4):487-94. doi: 10.1093/bioinformatics/btq700. Epub 2010 Dec 17.
7
Mathematical model for empirically optimizing large scale production of soluble protein domains.用于经验优化可溶性蛋白结构域大规模生产的数学模型。
BMC Bioinformatics. 2010 Mar 1;11:113. doi: 10.1186/1471-2105-11-113.
8
Loop-length-dependent SVM prediction of domain linkers for high-throughput structural proteomics.用于高通量结构蛋白质组学的结构域连接子的环长依赖性支持向量机预测
Biopolymers. 2009;92(1):1-8. doi: 10.1002/bip.21105.
9
AAindex: amino acid index database, progress report 2008.AAindex:氨基酸索引数据库,2008年进展报告。
Nucleic Acids Res. 2008 Jan;36(Database issue):D202-5. doi: 10.1093/nar/gkm998. Epub 2007 Nov 12.
10
Identification of putative domain linkers by a neural network - application to a large sequence database.通过神经网络识别假定的结构域连接子——应用于大型序列数据库
BMC Bioinformatics. 2006 Jun 27;7:323. doi: 10.1186/1471-2105-7-323.