• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于线性邻域表示的模糊模型的蛋白质结晶鉴定。

Protein Crystallization Identification via Fuzzy Model on Linear Neighborhood Representation.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1986-1995. doi: 10.1109/TCBB.2019.2954826. Epub 2021 Oct 7.

DOI:10.1109/TCBB.2019.2954826
PMID:31751248
Abstract

X-ray crystallography is the most popular approach for analyzing protein 3D structure. However, the success rate of protein crystallization is very low (2-10 percent). To reduce the cost of time and resources, lots of computation-based methods are developed to detect the protein crystallization. Improving the accuracy of predicting protein crystallization is very important for the determination of protein structure by X-ray crystallography. At present, many machine learning methods are used to predict protein crystallization. In this article, we propose a Fuzzy Support Vector Machine based on Linear Neighborhood Representation (FSVM-LNR) to predict the crystallization propensity of proteins. Proteins are represented by three types of features (PsePSSM, PSSM-DWT, MMI-PS), and these features are serially combined and fed into FSVM-LNR. FSVM-LNR can filter outliers by membership score, which is calculated via reconstruction residuals of k nearest samples. To evaluate the performance of our predictive model, we test FSVM-LNR on the datasets of TRAIN3587, TEST3585 and TEST500. Our method achieves better Mathew's correlation coefficient (MCC) on TRAIN3587 (MCC: 0.56) and TEST3585 (MCC: 0.58). Although the performance of independent test is not the best on TEST500, FSVM-LNR also has a certain predictability (MCC: 0.70) in the identification of protein crystallization. The good performance on the datasets proves the effectiveness of our method and the better performance on large datasets further demonstrates the stability and superiority of our method.

摘要

X 射线晶体学是分析蛋白质三维结构最常用的方法。然而,蛋白质结晶的成功率非常低(2-10%)。为了降低时间和资源成本,开发了许多基于计算的方法来检测蛋白质结晶。提高预测蛋白质结晶的准确性对于 X 射线晶体学确定蛋白质结构非常重要。目前,许多机器学习方法被用于预测蛋白质结晶。在本文中,我们提出了一种基于线性邻域表示的模糊支持向量机(FSVM-LNR)来预测蛋白质的结晶倾向。蛋白质由三种类型的特征(PsePSSM、PSSM-DWT、MMI-PS)表示,这些特征被串行组合并输入到 FSVM-LNR 中。FSVM-LNR 可以通过成员得分过滤异常值,成员得分是通过 k 个最近样本的重构残差计算得出的。为了评估我们预测模型的性能,我们在 TRAIN3587、TEST3585 和 TEST500 数据集上测试了 FSVM-LNR。我们的方法在 TRAIN3587(MCC:0.56)和 TEST3585(MCC:0.58)数据集上取得了更好的马修相关系数(MCC)。尽管在 TEST500 上的独立测试性能不是最好的,但 FSVM-LNR 在鉴定蛋白质结晶方面也具有一定的预测能力(MCC:0.70)。在数据集上的良好性能证明了我们方法的有效性,在大型数据集上的更好性能进一步证明了我们方法的稳定性和优越性。

相似文献

1
Protein Crystallization Identification via Fuzzy Model on Linear Neighborhood Representation.基于线性邻域表示的模糊模型的蛋白质结晶鉴定。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1986-1995. doi: 10.1109/TCBB.2019.2954826. Epub 2021 Oct 7.
2
CrystalM: A Multi-View Fusion Approach for Protein Crystallization Prediction.CrystalM:一种用于蛋白质结晶预测的多视图融合方法。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):325-335. doi: 10.1109/TCBB.2019.2912173. Epub 2021 Feb 3.
3
Protein secondary structure prediction based on the fuzzy support vector machine with the hyperplane optimization.基于超平面优化的模糊支持向量机的蛋白质二级结构预测
Gene. 2018 Feb 5;642:74-83. doi: 10.1016/j.gene.2017.11.005. Epub 2017 Nov 14.
4
Fuzzy support vector machine with joint optimization of genetic algorithm and fuzzy c-means.基于遗传算法和模糊 C 均值联合优化的模糊支持向量机。
Technol Health Care. 2021;29(5):921-937. doi: 10.3233/THC-202619.
5
Prediction of B-cell epitopes using evolutionary information and propensity scales.利用进化信息和倾向尺度预测 B 细胞表位。
BMC Bioinformatics. 2013;14 Suppl 2(Suppl 2):S10. doi: 10.1186/1471-2105-14-s2-s10.
6
FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation.FTWSVM-SR:基于自表示的模糊孪生支持向量机进行 DNA 结合蛋白识别。
Interdiscip Sci. 2022 Jun;14(2):372-384. doi: 10.1007/s12539-021-00489-6. Epub 2021 Nov 6.
7
A comparative study of surface EMG classification by fuzzy relevance vector machine and fuzzy support vector machine.基于模糊相关向量机和模糊支持向量机的表面肌电图分类比较研究
Physiol Meas. 2015 Feb;36(2):191-206. doi: 10.1088/0967-3334/36/2/191. Epub 2015 Jan 9.
8
New fuzzy support vector machine for the class imbalance problem in medical datasets classification.用于医学数据集分类中类别不平衡问题的新型模糊支持向量机
ScientificWorldJournal. 2014 Mar 23;2014:536434. doi: 10.1155/2014/536434. eCollection 2014.
9
Fuzzy support vector machine for classification of EEG signals using wavelet-based features.基于小波特征的模糊支持向量机用于脑电信号分类
Med Eng Phys. 2009 Sep;31(7):858-65. doi: 10.1016/j.medengphy.2009.04.005. Epub 2009 May 31.
10
Protein crystallization prediction with AdaBoost.使用AdaBoost进行蛋白质结晶预测。
Int J Data Min Bioinform. 2013;7(2):214-27. doi: 10.1504/ijdmb.2013.053197.

引用本文的文献

1
Identification of Vesicle Transport Proteins Hypergraph Regularized K-Local Hyperplane Distance Nearest Neighbour Model.囊泡转运蛋白的鉴定 超图正则化K局部超平面距离最近邻模型
Front Genet. 2022 Jul 13;13:960388. doi: 10.3389/fgene.2022.960388. eCollection 2022.
2
Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm.通过极端梯度提升算法识别DNA结合蛋白。
Front Genet. 2022 Jan 28;12:821996. doi: 10.3389/fgene.2021.821996. eCollection 2021.
3
Immunoglobulin Classification Based on FC* and GC* Features.
基于Fc*和Gc*特征的免疫球蛋白分类
Front Genet. 2022 Jan 24;12:827161. doi: 10.3389/fgene.2021.827161. eCollection 2021.
4
AOPM: Application of Antioxidant Protein Classification Model in Predicting the Composition of Antioxidant Drugs.AOPM:抗氧化蛋白分类模型在预测抗氧化药物成分中的应用。
Front Pharmacol. 2022 Jan 18;12:818115. doi: 10.3389/fphar.2021.818115. eCollection 2021.
5
Predicting subcellular location of protein with evolution information and sequence-based deep learning.利用进化信息和基于序列的深度学习预测蛋白质的亚细胞定位。
BMC Bioinformatics. 2021 Oct 22;22(Suppl 10):515. doi: 10.1186/s12859-021-04404-0.
6
Identifying potential association on gene-disease network via dual hypergraph regularized least squares.基于双超图正则化最小二乘法的基因-疾病网络潜在关联识别。
BMC Genomics. 2021 Aug 9;22(1):605. doi: 10.1186/s12864-021-07864-z.
7
A Self-Representation-Based Fuzzy SVM Model for Predicting Vascular Calcification of Hemodialysis Patients.基于自表示的模糊 SVM 模型用于预测血液透析患者的血管钙化。
Comput Math Methods Med. 2021 Jul 27;2021:2464821. doi: 10.1155/2021/2464821. eCollection 2021.
8
A sequence-based multiple kernel model for identifying DNA-binding proteins.基于序列的多重核模型用于识别 DNA 结合蛋白。
BMC Bioinformatics. 2021 May 31;22(Suppl 3):291. doi: 10.1186/s12859-020-03875-x.
9
4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism.4mCPred-MTL:基于多头注意力机制的多任务深度学习准确识别多个物种中的DNA 4-甲基胞嘧啶位点
Front Cell Dev Biol. 2021 May 10;9:664669. doi: 10.3389/fcell.2021.664669. eCollection 2021.
10
iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool.iDNA-MT:基于神经网络工具利用多任务学习识别多个物种中的DNA修饰位点
Front Genet. 2021 Mar 31;12:663572. doi: 10.3389/fgene.2021.663572. eCollection 2021.