• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DROP:一种使用随机森林选择的最佳特征训练的 SVM 域链接器预测器。

DROP: an SVM domain linker predictor trained with optimal features selected by random forest.

机构信息

Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, Koganei-shi, Tokyo 184-8588, Japan.

出版信息

Bioinformatics. 2011 Feb 15;27(4):487-94. doi: 10.1093/bioinformatics/btq700. Epub 2010 Dec 17.

DOI:10.1093/bioinformatics/btq700
PMID:21169376
Abstract

MOTIVATION

Biologically important proteins are often large, multidomain proteins, which are difficult to characterize by high-throughput experimental methods. Efficient domain/boundary predictions are thus increasingly required in diverse area of proteomics research for computationally dissecting proteins into readily analyzable domains.

RESULTS

We constructed a support vector machine (SVM)-based domain linker predictor, DROP (Domain linker pRediction using OPtimal features), which was trained with 25 optimal features. The optimal combination of features was identified from a set of 3000 features using a random forest algorithm complemented with a stepwise feature selection. DROP demonstrated a prediction sensitivity and precision of 41.3 and 49.4%, respectively. These values were over 19.9% higher than those of control SVM predictors trained with non-optimized features, strongly suggesting the efficiency of our feature selection method. In addition, the mean NDO-Score of DROP for predicting novel domains in seven CASP8 FM multidomain proteins was 0.760, which was higher than any of the 12 published CASP8 DP servers. Overall, these results indicate that the SVM prediction of domain linkers can be improved by identifying optimal features that best distinguish linker from non-linker regions.

摘要

动机

生物重要的蛋白质通常是大型的、多结构域的蛋白质,这些蛋白质很难通过高通量实验方法进行特征描述。因此,在蛋白质组学研究的各个领域,高效的结构域/边界预测越来越受到需求,以通过计算将蛋白质分割成易于分析的结构域。

结果

我们构建了一个基于支持向量机(SVM)的结构域连接预测器 DROP(使用最优特征进行结构域连接预测),它是使用 25 个最优特征进行训练的。最优特征的最佳组合是通过随机森林算法和逐步特征选择从 3000 个特征中确定的。DROP 的预测灵敏度和精度分别为 41.3%和 49.4%。这些值比使用非优化特征训练的对照 SVM 预测器高 19.9%以上,这强烈表明了我们的特征选择方法的效率。此外,DROP 预测七个 CASP8 FM 多结构域蛋白中新型结构域的平均 NDO 评分是 0.760,高于 12 个已发布的 CASP8 DP 服务器中的任何一个。总的来说,这些结果表明,通过识别最佳特征来区分连接子和非连接子区域,可以提高 SVM 对结构域连接子的预测。

相似文献

1
DROP: an SVM domain linker predictor trained with optimal features selected by random forest.DROP:一种使用随机森林选择的最佳特征训练的 SVM 域链接器预测器。
Bioinformatics. 2011 Feb 15;27(4):487-94. doi: 10.1093/bioinformatics/btq700. Epub 2010 Dec 17.
2
H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection.H-DROP:一种基于支持向量机的螺旋结构域连接子预测器,通过结合随机森林和逐步选择优化特征进行训练。
J Comput Aided Mol Des. 2014 Aug;28(8):831-9. doi: 10.1007/s10822-014-9763-x. Epub 2014 Jun 26.
3
Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers.快速H-DROP:H-DROP的30倍加速版本,用于基于支持向量机的螺旋结构域连接子的交互式预测。
J Comput Aided Mol Des. 2017 Feb;31(2):237-244. doi: 10.1007/s10822-016-9999-8. Epub 2016 Dec 27.
4
Loop-length-dependent SVM prediction of domain linkers for high-throughput structural proteomics.用于高通量结构蛋白质组学的结构域连接子的环长依赖性支持向量机预测
Biopolymers. 2009;92(1):1-8. doi: 10.1002/bip.21105.
5
DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.DHSpred:基于支持向量机,利用随机森林选择的最优特征进行人类DNA酶I超敏感位点预测。
Oncotarget. 2017 Dec 8;9(2):1944-1956. doi: 10.18632/oncotarget.23099. eCollection 2018 Jan 5.
6
Computational identification of human long intergenic non-coding RNAs using a GA-SVM algorithm.基于 GA-SVM 算法的人类长链非编码 RNA 计算识别。
Gene. 2014 Jan 1;533(1):94-9. doi: 10.1016/j.gene.2013.09.118. Epub 2013 Oct 9.
7
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.PVP-SVM:使用支持向量机基于序列预测噬菌体病毒粒子蛋白
Front Microbiol. 2018 Mar 16;9:476. doi: 10.3389/fmicb.2018.00476. eCollection 2018.
8
Inferring protein-protein interactions using a hybrid genetic algorithm/support vector machine method.使用混合遗传算法/支持向量机方法推断蛋白质-蛋白质相互作用。
Protein Pept Lett. 2010 Sep;17(9):1079-84. doi: 10.2174/092986610791760379.
9
DomHR: accurately identifying domain boundaries in proteins using a hinge region strategy.使用铰链区策略准确识别蛋白质的结构域边界。
PLoS One. 2013 Apr 11;8(4):e60559. doi: 10.1371/journal.pone.0060559. Print 2013.
10
Protein subcellular localization prediction based on compartment-specific biological features.基于特定区室生物学特征的蛋白质亚细胞定位预测
Comput Syst Bioinformatics Conf. 2006:325-30.

引用本文的文献

1
Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.蛋白质科学与人工智能相遇:跨领域的系统评价与生化荟萃分析
Front Bioeng Biotechnol. 2022 Jul 7;10:788300. doi: 10.3389/fbioe.2022.788300. eCollection 2022.
2
Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps.基于多头注意力的 U-Net 模型,利用 1D 序列特征和 2D 距离图预测蛋白质结构域边界。
BMC Bioinformatics. 2022 Jul 19;23(1):283. doi: 10.1186/s12859-022-04829-1.
3
Frailty Level Classification of the Community Elderly Using Microsoft Kinect-Based Skeleton Pose: A Machine Learning Approach.
基于微软 Kinect 骨骼姿势的社区老年人虚弱程度分类:一种机器学习方法。
Sensors (Basel). 2021 Jun 10;21(12):4017. doi: 10.3390/s21124017.
4
Protein domain identification methods and online resources.蛋白质结构域鉴定方法及在线资源。
Comput Struct Biotechnol J. 2021 Feb 2;19:1145-1153. doi: 10.1016/j.csbj.2021.01.041. eCollection 2021.
5
RIG-I Has a Role in Immunity Against , a Gastrointestinal Parasite in : A Novel Report.RIG-I 在抗胃肠道寄生虫感染中的作用:一项新的报告。
Front Immunol. 2021 Jan 8;11:534705. doi: 10.3389/fimmu.2020.534705. eCollection 2020.
6
GasPhos: Protein Phosphorylation Site Prediction Using a New Feature Selection Approach with a GA-Aided Ant Colony System.GasPhos:一种使用新的特征选择方法和 GA 辅助蚁群系统进行蛋白质磷酸化位点预测。
Int J Mol Sci. 2020 Oct 24;21(21):7891. doi: 10.3390/ijms21217891.
7
Hypoxanthine-Guanine Phosphoribosyltransferase/adenylate Kinase From : A Bifunctional Catalyst for the Synthesis of Nucleoside-5'-Mono-, Di- and Triphosphates.次黄嘌呤-鸟嘌呤磷酸核糖转移酶/腺苷酸激酶:一种用于合成核苷5'-单磷酸、二磷酸和三磷酸的双功能催化剂。
Front Bioeng Biotechnol. 2020 Jun 24;8:677. doi: 10.3389/fbioe.2020.00677. eCollection 2020.
8
Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis.探索结合机器学习的生物物理倾向性尺度在蛋白质序列分析中的局限性。
Sci Rep. 2019 Nov 15;9(1):16932. doi: 10.1038/s41598-019-53324-w.
9
DeepDom: Predicting protein domain boundary from sequence alone using stacked bidirectional LSTM.DeepDom:仅使用堆叠双向长短期记忆网络从序列预测蛋白质结构域边界
Pac Symp Biocomput. 2019;24:66-75.
10
DPPred: An Effective Prediction Framework with Concise Discriminative Patterns.DPPred:一个具有简洁判别模式的有效预测框架。
IEEE Trans Knowl Data Eng. 2018 Jul;30(7):1226-1239. doi: 10.1109/TKDE.2017.2757476. Epub 2017 Sep 28.