• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于数据分区和半随机子空间方法的蛋白质二级结构预测。

Protein Secondary Structure Prediction Based on Data Partition and Semi-Random Subspace Method.

机构信息

College of Information, Qilu University of Technology(Shandong Academy of Sciences), Jinan, China.

出版信息

Sci Rep. 2018 Jun 29;8(1):9856. doi: 10.1038/s41598-018-28084-8.

DOI:10.1038/s41598-018-28084-8
PMID:29959372
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6026213/
Abstract

Protein secondary structure prediction is one of the most important and challenging problems in bioinformatics. Machine learning techniques have been applied to solve the problem and have gained substantial success in this research area. However there is still room for improvement toward the theoretical limit. In this paper, we present a novel method for protein secondary structure prediction based on a data partition and semi-random subspace method (PSRSM). Data partitioning is an important strategy for our method. First, the protein training dataset was partitioned into several subsets based on the length of the protein sequence. Then we trained base classifiers on the subspace data generated by the semi-random subspace method, and combined base classifiers by majority vote rule into ensemble classifiers on each subset. Multiple classifiers were trained on different subsets. These different classifiers were used to predict the secondary structures of different proteins according to the protein sequence length. Experiments are performed on 25PDB, CB513, CASP10, CASP11, CASP12, and T100 datasets, and the good performance of 86.38%, 84.53%, 85.51%, 85.89%, 85.55%, and 85.09% is achieved respectively. Experimental results showed that our method outperforms other state-of-the-art methods.

摘要

蛋白质二级结构预测是生物信息学中最重要和最具挑战性的问题之一。机器学习技术已被应用于解决该问题,并在该研究领域取得了实质性的成功。然而,要达到理论极限仍有改进的空间。本文提出了一种基于数据分区和半随机子空间方法(PSRSM)的蛋白质二级结构预测新方法。数据分区是我们方法的重要策略。首先,根据蛋白质序列的长度将蛋白质训练数据集划分为几个子集。然后,我们在半随机子空间方法生成的子空间数据上训练基本分类器,并通过多数投票规则将基本分类器组合成集成分类器,用于每个子集。在不同的子集中训练多个分类器。这些不同的分类器根据蛋白质序列长度预测不同蛋白质的二级结构。在 25PDB、CB513、CASP10、CASP11、CASP12 和 T100 数据集上进行了实验,分别取得了 86.38%、84.53%、85.51%、85.89%、85.55%和 85.09%的优异性能。实验结果表明,我们的方法优于其他最先进的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe78/6026213/03bc320f2a9e/41598_2018_28084_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe78/6026213/0cb0f1d8c248/41598_2018_28084_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe78/6026213/03bc320f2a9e/41598_2018_28084_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe78/6026213/0cb0f1d8c248/41598_2018_28084_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe78/6026213/03bc320f2a9e/41598_2018_28084_Fig2_HTML.jpg

相似文献

1
Protein Secondary Structure Prediction Based on Data Partition and Semi-Random Subspace Method.基于数据分区和半随机子空间方法的蛋白质二级结构预测。
Sci Rep. 2018 Jun 29;8(1):9856. doi: 10.1038/s41598-018-28084-8.
2
Prediction of 8-state protein secondary structures by a novel deep learning architecture.一种新型深度学习架构预测 8 态蛋白质二级结构。
BMC Bioinformatics. 2018 Aug 3;19(1):293. doi: 10.1186/s12859-018-2280-5.
3
OCLSTM: Optimized convolutional and long short-term memory neural network model for protein secondary structure prediction.OCLSTM:用于蛋白质二级结构预测的优化卷积和长短期记忆神经网络模型。
PLoS One. 2021 Feb 3;16(2):e0245982. doi: 10.1371/journal.pone.0245982. eCollection 2021.
4
DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction.DeepACLSTM:用于蛋白质二级结构预测的深度非对称卷积长短时记忆神经模型。
BMC Bioinformatics. 2019 Jun 17;20(1):341. doi: 10.1186/s12859-019-2940-0.
5
Ensemble of Template-Free and Template-Based Classifiers for Protein Secondary Structure Prediction.无模板和基于模板的分类器集成方法用于蛋白质二级结构预测。
Int J Mol Sci. 2021 Oct 23;22(21):11449. doi: 10.3390/ijms222111449.
6
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.
7
DNCON2: improved protein contact prediction using two-level deep convolutional neural networks.DNCON2:使用两级深度卷积神经网络改进蛋白质接触预测。
Bioinformatics. 2018 May 1;34(9):1466-1472. doi: 10.1093/bioinformatics/btx781.
8
Enhanced Protein Structural Class Prediction Using Effective Feature Modeling and Ensemble of Classifiers.利用有效的特征建模和分类器集成增强蛋白质结构类预测。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2409-2419. doi: 10.1109/TCBB.2020.2979430. Epub 2021 Dec 8.
9
A two-stage approach towards protein secondary structure classification.两段式方法用于蛋白质二级结构分类。
Med Biol Eng Comput. 2020 Aug;58(8):1723-1737. doi: 10.1007/s11517-020-02194-w. Epub 2020 May 29.
10
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.

引用本文的文献

1
Harnessing computational immunology to design targeted subunit vaccines for infectious bursal disease in poultry.利用计算免疫学设计针对家禽传染性法氏囊病的靶向亚单位疫苗。
Front Bioinform. 2025 Apr 4;5:1562997. doi: 10.3389/fbinf.2025.1562997. eCollection 2025.
2
Immunoinformatic evaluation for the development of a potent multi-epitope vaccine against bacterial vaginosis caused by Gardnerella vaginalis.针对由阴道加德纳菌引起的细菌性阴道病开发一种有效的多表位疫苗的免疫信息学评估。
PLoS One. 2025 Feb 27;20(2):e0316699. doi: 10.1371/journal.pone.0316699. eCollection 2025.
3
MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction.

本文引用的文献

1
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.MUFOLD-SS:用于蛋白质二级结构预测的新深度 inception-inside-inception 网络。
Proteins. 2018 May;86(5):592-598. doi: 10.1002/prot.25487. Epub 2018 Mar 12.
2
Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility.利用长短期记忆双向递归神经网络捕捉非局部相互作用,提高蛋白质二级结构、主链角度、接触数和溶剂可及性的预测能力。
Bioinformatics. 2017 Sep 15;33(18):2842-2849. doi: 10.1093/bioinformatics/btx218.
3
MHTAPred-SS:一种用于准确蛋白质二级结构预测的高度靶向的自动编码器驱动的深度多任务学习框架。
Int J Mol Sci. 2024 Dec 15;25(24):13444. doi: 10.3390/ijms252413444.
4
Bioinformatics designing of an mRNA vaccine for Mokola virus (MOKV) using immunoinformatics as a secure strategy for successful vaccine development.基于免疫信息学的莫科拉病毒(MOKV)mRNA 疫苗的生物信息学设计,是一种成功开发疫苗的安全策略。
BMC Immunol. 2024 Nov 20;25(1):77. doi: 10.1186/s12865-024-00668-2.
5
Recent Advances in Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences.从蛋白质序列预测二级和超二级结构的计算方法的最新进展
Methods Mol Biol. 2025;2870:1-19. doi: 10.1007/978-1-0716-4213-9_1.
6
Molecular genetic variability of 1 associated with in South Tyrol (northern Italy).意大利北部南蒂罗尔地区与[具体内容缺失]相关的1的分子遗传变异性。
Front Microbiol. 2024 Feb 27;15:1291542. doi: 10.3389/fmicb.2024.1291542. eCollection 2024.
7
GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier.GBDT_KgluSite:一种基于特征融合和 GBDT 分类器的赖氨酸谷氨酰化位点改进计算预测模型。
BMC Genomics. 2023 Dec 11;24(1):765. doi: 10.1186/s12864-023-09834-z.
8
HIV-1 Vif protein sequence variations in South African people living with HIV and their influence on Vif-APOBEC3G interaction.南非艾滋病病毒感染者中HIV-1 Vif蛋白序列变异及其对Vif-载脂蛋白B mRNA编辑酶催化多肽样3G相互作用的影响
Eur J Clin Microbiol Infect Dis. 2024 Feb;43(2):325-338. doi: 10.1007/s10096-023-04728-0. Epub 2023 Dec 11.
9
In Silico Analysis: Genome-Wide Identification, Characterization and Evolutionary Adaptations of Bone Morphogenetic Protein (BMP) Gene Family in Homo sapiens.计算机分析:人类骨形态发生蛋白(BMP)基因家族的全基因组鉴定、特征描述和进化适应。
Mol Biotechnol. 2024 Nov;66(11):3336-3356. doi: 10.1007/s12033-023-00944-3. Epub 2023 Nov 1.
10
Ensemble deep learning models for protein secondary structure prediction using bidirectional temporal convolution and bidirectional long short-term memory.使用双向时间卷积和双向长短期记忆的集成深度学习模型用于蛋白质二级结构预测。
Front Bioeng Biotechnol. 2023 Feb 13;11:1051268. doi: 10.3389/fbioe.2023.1051268. eCollection 2023.
Sixty-five years of the long march in protein secondary structure prediction: the final stretch?
蛋白质二级结构预测的长征:终章?
Brief Bioinform. 2018 May 1;19(3):482-494. doi: 10.1093/bib/bbw129.
4
Critical assessment of methods of protein structure prediction: Progress and new directions in round XI.蛋白质结构预测方法的批判性评估:第十一轮的进展与新方向
Proteins. 2016 Sep;84 Suppl 1(Suppl 1):4-14. doi: 10.1002/prot.25064. Epub 2016 Jun 1.
5
RaptorX-Property: a web server for protein structure property prediction.猛禽X属性:一个用于蛋白质结构属性预测的网络服务器。
Nucleic Acids Res. 2016 Jul 8;44(W1):W430-5. doi: 10.1093/nar/gkw306. Epub 2016 Apr 25.
6
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.基于深度卷积神经场的蛋白质二级结构预测
Sci Rep. 2016 Jan 11;6:18962. doi: 10.1038/srep18962.
7
ChSeq: A database of chameleon sequences.ChSeq:一个变色龙序列数据库。
Protein Sci. 2015 Jul;24(7):1075-86. doi: 10.1002/pro.2689. Epub 2015 Jun 16.
8
JPred4: a protein secondary structure prediction server.JPred4:一种蛋白质二级结构预测服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W389-94. doi: 10.1093/nar/gkv332. Epub 2015 Apr 16.
9
A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction.一种用于从头预测蛋白质二级结构的深度学习网络方法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):103-12. doi: 10.1109/TCBB.2014.2343960. Epub 2014 Aug 7.
10
Critical assessment of methods of protein structure prediction (CASP)--round x.蛋白质结构预测方法的关键评估(CASP)——第x轮
Proteins. 2014 Feb;82 Suppl 2(0 2):1-6. doi: 10.1002/prot.24452. Epub 2013 Dec 17.