• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种通过整合序列和结构信息改进的蛋白质结构类别预测方法。

An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information.

出版信息

IEEE Trans Nanobioscience. 2015 Jun;14(4):339-349. doi: 10.1109/TNB.2014.2352454. Epub 2014 Sep 15.

DOI:10.1109/TNB.2014.2352454
PMID:25248192
Abstract

Protein structural classes information is beneficial for secondary and tertiary structure prediction, protein folds prediction, and protein function analysis. Thus, predicting protein structural classes is of vital importance. In recent years, several computational methods have been developed for low-sequence-similarity (25%-40%) protein structural classes prediction. However, the reported prediction accuracies are actually not satisfactory. Aiming to further improve the prediction accuracies, we propose three different feature extraction methods and construct a comprehensive feature set that captures both sequence and structure information. By applying a random forest (RF) classifier to the feature set, we further develop a novel method for structural classes prediction. We test the proposed method on three benchmark datasets (25PDB, 640, and 1189) with low sequence similarity, and obtain the overall prediction accuracies of 93.5%, 92.6%, and 93.4%, respectively. Compared with six competing methods, the accuracies we achieved are 3.4%, 6.2%, and 8.7% higher than those achieved by the best-performing methods, showing the superiority of our method. Moreover, due to the limitation of the size of the three benchmark datasets, we further test the proposed method on three updated large-scale datasets with different sequence similarities (40%, 30%, and 25%). The proposed method achieves above 90% accuracies for all the three datasets, consistent with the accuracies on the above three benchmark datasets. Experimental results suggest our method as an effective and promising tool for structural classes prediction. Currently, a webserver that implements the proposed method is available on http://121.192.180.204:8080/RF_PSCP/Index.html.

摘要

蛋白质结构类别信息有利于二级和三级结构预测、蛋白质折叠预测以及蛋白质功能分析。因此,预测蛋白质结构类别至关重要。近年来,已开发出多种计算方法用于低序列相似性(25%-40%)蛋白质结构类别的预测。然而,所报道的预测准确率实际上并不令人满意。为了进一步提高预测准确率,我们提出了三种不同的特征提取方法,并构建了一个综合特征集,该特征集能同时捕捉序列和结构信息。通过将随机森林(RF)分类器应用于该特征集,我们进一步开发了一种用于结构类别预测的新方法。我们在三个低序列相似性的基准数据集(25PDB、640和1189)上测试了所提出的方法,分别获得了93.5%、92.6%和93.4%的总体预测准确率。与六种竞争方法相比,我们所取得的准确率比表现最佳的方法分别高出3.4%、6.2%和8.7%,显示了我们方法的优越性。此外,由于这三个基准数据集规模的限制,我们进一步在三个具有不同序列相似性(40%、30%和25%)的更新后的大规模数据集上测试了所提出的方法。所提出的方法在所有这三个数据集上都达到了90%以上的准确率,与上述三个基准数据集上的准确率一致。实验结果表明我们的方法是一种用于结构类别预测的有效且有前景的工具。目前,一个实现所提出方法的网络服务器可在http://121.192.180.204:8080/RF_PSCP/Index.html上获取。

相似文献

1
An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information.一种通过整合序列和结构信息改进的蛋白质结构类别预测方法。
IEEE Trans Nanobioscience. 2015 Jun;14(4):339-349. doi: 10.1109/TNB.2014.2352454. Epub 2014 Sep 15.
2
Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique.通过一种新型特征提取技术增强蛋白质折叠预测方法
IEEE Trans Nanobioscience. 2015 Sep;14(6):649-59. doi: 10.1109/TNB.2015.2450233.
3
Structural class prediction of protein using novel feature extraction method from chaos game representation of predicted secondary structure.利用从预测二级结构的混沌博弈表示中提取的新特征方法对蛋白质进行结构类预测。
J Theor Biol. 2016 Jul 7;400:1-10. doi: 10.1016/j.jtbi.2016.04.011. Epub 2016 Apr 12.
4
Prediction of protein structural class using novel evolutionary collocation-based sequence representation.使用基于新型进化搭配的序列表示法预测蛋白质结构类别。
J Comput Chem. 2008 Jul 30;29(10):1596-604. doi: 10.1002/jcc.20918.
5
Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou's pseudo amino acid composition.通过将预测的二级结构信息纳入周的伪氨基酸组成的通用形式,准确预测蛋白质结构类别。
J Theor Biol. 2014 Mar 7;344:12-8. doi: 10.1016/j.jtbi.2013.11.021. Epub 2013 Dec 6.
6
Prediction of protein structural classes for low-homology sequences based on predicted secondary structure.基于预测的二级结构预测低同源序列的蛋白质结构类别。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-11-S1-S9.
7
A high-accuracy protein structural class prediction algorithm using predicted secondary structural information.利用预测的二级结构信息进行高精度蛋白质结构类预测算法。
J Theor Biol. 2010 Dec 7;267(3):272-5. doi: 10.1016/j.jtbi.2010.09.007. Epub 2010 Sep 8.
8
Improving the prediction accuracy of protein structural class: approached with alternating word frequency and normalized Lempel-Ziv complexity.提高蛋白质结构类别的预测准确性:采用交替词频和归一化莱姆尔-齐夫复杂度的方法。
J Theor Biol. 2014 Jan 21;341:71-7. doi: 10.1016/j.jtbi.2013.10.002. Epub 2013 Oct 17.
9
Novel structure-driven features for accurate prediction of protein structural class.用于准确预测蛋白质结构类别的新型结构驱动特征。
Genomics. 2014 Apr;103(4):292-7. doi: 10.1016/j.ygeno.2014.04.002. Epub 2014 Apr 18.
10
A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.一种使用自动交叉协方差变换和递归特征消除的高精度蛋白质结构类别预测方法。
Comput Biol Chem. 2015 Dec;59 Pt A:95-100. doi: 10.1016/j.compbiolchem.2015.08.012. Epub 2015 Sep 2.

引用本文的文献

1
Prediction of antibody-antigen interaction based on backbone aware with invariant point attention.基于具有不变点注意力的骨架感知的抗体-抗原相互作用预测。
BMC Bioinformatics. 2024 Nov 6;25(1):348. doi: 10.1186/s12859-024-05961-w.
2
PreAcrs: a machine learning framework for identifying anti-CRISPR proteins.预 Acrs:一种用于识别抗 CRISPR 蛋白的机器学习框架。
BMC Bioinformatics. 2022 Oct 25;23(1):444. doi: 10.1186/s12859-022-04986-3.
3
IBPred: A sequence-based predictor for identifying ion binding protein in phage.IBPred:一种基于序列的噬菌体离子结合蛋白识别预测工具。
Comput Struct Biotechnol J. 2022 Aug 28;20:4942-4951. doi: 10.1016/j.csbj.2022.08.053. eCollection 2022.
4
Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.基于 LSTM-CNN 特征融合的 DNA 结合蛋白识别方法研究。
Comput Math Methods Med. 2022 Jun 2;2022:9705275. doi: 10.1155/2022/9705275. eCollection 2022.
5
Prediction of G Protein-Coupled Receptors With CTDC Extraction and MRMD2.0 Dimension-Reduction Methods.基于CTDC提取和MRMD2.0降维方法的G蛋白偶联受体预测
Front Bioeng Biotechnol. 2020 Jun 25;8:635. doi: 10.3389/fbioe.2020.00635. eCollection 2020.
6
CirRNAPL: A web server for the identification of circRNA based on extreme learning machine.CirRNAPL:一个基于极限学习机的环状RNA识别网络服务器。
Comput Struct Biotechnol J. 2020 Apr 2;18:834-842. doi: 10.1016/j.csbj.2020.03.028. eCollection 2020.
7
Identifying protein-protein interface via a novel multi-scale local sequence and structural representation.通过一种新的多尺度局部序列和结构表示来识别蛋白质-蛋白质界面。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 15):483. doi: 10.1186/s12859-019-3048-2.
8
MADOKA: an ultra-fast approach for large-scale protein structure similarity searching.MADOKA:一种用于大规模蛋白质结构相似性搜索的超快速方法。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):662. doi: 10.1186/s12859-019-3235-1.
9
Identification of Phage Viral Proteins With Hybrid Sequence Features.具有杂交序列特征的噬菌体病毒蛋白的鉴定
Front Microbiol. 2019 Mar 26;10:507. doi: 10.3389/fmicb.2019.00507. eCollection 2019.
10
IDP⁻CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields.IDP⁻CRF:基于条件随机场的无序蛋白/区域识别。
Int J Mol Sci. 2018 Aug 22;19(9):2483. doi: 10.3390/ijms19092483.