基于序列的跨膜蛋白结晶倾向预测。

Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity.

机构信息

School of Information Engineering, Huangshan University, Huangshan, 245041, China.

Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, China.

出版信息

Interdiscip Sci. 2021 Dec;13(4):693-702. doi: 10.1007/s12539-021-00448-1. Epub 2021 Jun 18.

DOI:10.1007/s12539-021-00448-1

PMID:34143353

Abstract

Transmembrane proteins play a vital role in cell life activities. There are several techniques to determine transmembrane protein structures and X-ray crystallography is the primary methodology. However, due to the special properties of transmembrane proteins, it is still hard to determine their structures by X-ray crystallography technique. To reduce experimental consumption and improve experimental efficiency, it is of great significance to develop computational methods for predicting the crystallization propensity of transmembrane proteins. In this work, we proposed a sequence-based machine learning method, namely Prediction of TransMembrane protein Crystallization propensity (PTMC), to predict the propensity of transmembrane protein crystallization. First, we obtained several general sequence features and the specific encoded features of relative solvent accessibility and hydrophobicity. Second, feature selection was employed to filter out redundant and irrelevant features, and the optimal feature subset is composed of hydrophobicity, amino acid composition and relative solvent accessibility. Finally, we chose extreme gradient boosting by comparing with other several machine learning methods. Comparative results on the independent test set indicate that PTMC outperforms state-of-the-art sequence-based methods in terms of sensitivity, specificity, accuracy, Matthew's Correlation Coefficient (MCC) and Area Under the receiver operating characteristic Curve (AUC). In comparison with two competitors, Bcrystal and TMCrys, PTMC achieves an improvement by 0.132 and 0.179 for sensitivity, 0.014 and 0.127 for specificity, 0.037 and 0.192 for accuracy, 0.128 and 0.362 for MCC, and 0.027 and 0.125 for AUC, respectively.

摘要

跨膜蛋白在细胞生命活动中起着至关重要的作用。有几种技术可以确定跨膜蛋白的结构，而 X 射线晶体学是主要方法。然而，由于跨膜蛋白的特殊性质，用 X 射线晶体学技术确定它们的结构仍然很困难。为了减少实验消耗，提高实验效率，开发用于预测跨膜蛋白结晶倾向的计算方法具有重要意义。在这项工作中，我们提出了一种基于序列的机器学习方法，即跨膜蛋白结晶倾向预测（PTMC），用于预测跨膜蛋白结晶的倾向。首先，我们获得了几个一般的序列特征和相对溶剂可及性和疏水性的特定编码特征。其次，采用特征选择来过滤冗余和不相关的特征，最优特征子集由疏水性、氨基酸组成和相对溶剂可及性组成。最后，我们通过与其他几种机器学习方法进行比较，选择了极端梯度增强。在独立测试集上的比较结果表明，PTMC 在灵敏度、特异性、准确性、马修相关系数（MCC）和接收者操作特征曲线下的面积（AUC）方面优于基于序列的最新方法。与两个竞争对手 Bcrystal 和 TMCrys 相比，PTMC 在灵敏度方面分别提高了 0.132 和 0.179，特异性提高了 0.014 和 0.127，准确性提高了 0.037 和 0.192，MCC 提高了 0.128 和 0.362，AUC 提高了 0.027 和 0.125。

相似文献

Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity.

Interdiscip Sci. 2021 Dec;13(4):693-702. doi: 10.1007/s12539-021-00448-1. Epub 2021 Jun 18.

BCrystal: an interpretable sequence-based protein crystallization predictor.

Bioinformatics. 2020 Mar 1;36(5):1429-1438. doi: 10.1093/bioinformatics/btz762.

Accurate multistage prediction of protein crystallization propensity using deep-cascade forest with sequence-based features.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa076.

PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

PLoS One. 2014 Aug 22;9(8):e105902. doi: 10.1371/journal.pone.0105902. eCollection 2014.

GCmapCrys: Integrating graph attention network with predicted contact map for multi-stage protein crystallization propensity prediction.

Anal Biochem. 2023 Feb 15;663:115020. doi: 10.1016/j.ab.2022.115020. Epub 2022 Dec 12.

Lipid exposure prediction enhances the inference of rotational angles of transmembrane helices.

BMC Bioinformatics. 2013 Oct 11;14:304. doi: 10.1186/1471-2105-14-304.

CRYSpred: accurate sequence-based protein crystallization propensity prediction using sequence-derived structural characteristics.

Protein Pept Lett. 2012 Jan;19(1):40-9. doi: 10.2174/092986612798472910.

Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature.

Proteins. 2011 Apr;79(4):1230-9. doi: 10.1002/prot.22958. Epub 2011 Jan 25.

Sequence-based prediction of protein crystallization, purification and production propensity.

Bioinformatics. 2011 Jul 1;27(13):i24-33. doi: 10.1093/bioinformatics/btr229.

DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction.

Bioinformatics. 2019 Jul 1;35(13):2216-2225. doi: 10.1093/bioinformatics/bty953.

本文引用的文献

A 10-year meta-analysis of membrane protein structural biology: Detergents, membrane mimetics, and structure determination techniques.

Biochim Biophys Acta Biomembr. 2021 Mar 1;1863(3):183533. doi: 10.1016/j.bbamem.2020.183533. Epub 2020 Dec 17.

Prediction of Anticancer Peptides Using a Low-Dimensional Feature Model.

Front Bioeng Biotechnol. 2020 Aug 12;8:892. doi: 10.3389/fbioe.2020.00892. eCollection 2020.

Protein-ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data.

Bioinformatics. 2020 May 1;36(10):3018-3027. doi: 10.1093/bioinformatics/btaa110.

Advances in protein structure prediction and design.

Nat Rev Mol Cell Biol. 2019 Nov;20(11):681-697. doi: 10.1038/s41580-019-0163-x. Epub 2019 Aug 15.

Comparison and integration of computational methods for deleterious synonymous mutation prediction.

Brief Bioinform. 2020 May 21;21(3):970-981. doi: 10.1093/bib/bbz047.

TMEM Proteins in Cancer: A Review.

Front Pharmacol. 2018 Dec 6;9:1345. doi: 10.3389/fphar.2018.01345. eCollection 2018.

DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction.

Bioinformatics. 2019 Jul 1;35(13):2216-2225. doi: 10.1093/bioinformatics/bty953.

TMCrys: predict propensity of success for transmembrane protein crystallization.

Bioinformatics. 2018 Sep 15;34(18):3126-3130. doi: 10.1093/bioinformatics/bty342.

Blood-brain barrier breakdown in Alzheimer disease and other neurodegenerative disorders.

Nat Rev Neurol. 2018 Mar;14(3):133-150. doi: 10.1038/nrneurol.2017.188. Epub 2018 Jan 29.

DeepSF: deep convolutional neural network for mapping protein sequences to folds.

Bioinformatics. 2018 Apr 15;34(8):1295-1303. doi: 10.1093/bioinformatics/btx780.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于序列的跨膜蛋白结晶倾向预测。

Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献