Suppr超能文献

通过机器学习对水通道蛋白在肿瘤进展中的结构表征与预测

The Characterization of Structure and Prediction for Aquaporin in Tumour Progression by Machine Learning.

作者信息

Chen Zheng, Jiao Shihu, Zhao Da, Zou Quan, Xu Lei, Zhang Lijun, Su Xi

机构信息

School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.

出版信息

Front Cell Dev Biol. 2022 Feb 1;10:845622. doi: 10.3389/fcell.2022.845622. eCollection 2022.

Abstract

Recurrence and new cases of cancer constitute a challenging human health problem. Aquaporins (AQPs) can be expressed in many types of tumours, including the brain, breast, pancreas, colon, skin, ovaries, and lungs, and the histological grade of cancer is positively correlated with AQP expression. Therefore, the identification of aquaporins is an area to explore. Computational tools play an important role in aquaporin identification. In this research, we propose reliable, accurate and automated sequence predictor iAQPs-RF to identify AQPs. In this study, the feature extraction method was 188D (global protein sequence descriptor, GPSD). Six common classifiers, including random forest (RF), NaiveBayes (NB), support vector machine (SVM), XGBoost, logistic regression (LR) and decision tree (DT), were used for AQP classification. The classification results show that the random forest (RF) algorithm is the most suitable machine learning algorithm, and the accuracy was 97.689%. Analysis of Variance (ANOVA) was used to analyse these characteristics. Feature rank based on the ANOVA method and IFS strategy was applied to search for the optimal features. The classification results suggest that the 26th feature (neutral/hydrophobic) and 21st feature (hydrophobic) are the two most powerful and informative features that distinguish AQPs from non-AQPs. Previous studies reported that plasma membrane proteins have hydrophobic characteristics. Aquaporin subcellular localization prediction showed that all aquaporins were plasma membrane proteins with highly conserved transmembrane structures. In addition, the 3D structure of aquaporins was consistent with the localization results. Therefore, these studies confirmed that aquaporins possess hydrophobic properties. Although aquaporins are highly conserved transmembrane structures, the phylogenetic tree shows the diversity of aquaporins during evolution. The PCA showed that positive and negative samples were well separated by 54D features, indicating that the 54D feature can effectively classify aquaporins. The online prediction server is accessible at http://lab.malab.cn/∼acy/iAQP.

摘要

癌症的复发和新病例构成了一个具有挑战性的人类健康问题。水通道蛋白(AQPs)可在多种类型的肿瘤中表达,包括脑、乳腺、胰腺、结肠、皮肤、卵巢和肺,并且癌症的组织学分级与AQP表达呈正相关。因此,水通道蛋白的鉴定是一个有待探索的领域。计算工具在水通道蛋白鉴定中发挥着重要作用。在本研究中,我们提出了可靠、准确且自动化的序列预测器iAQPs-RF来鉴定水通道蛋白。在本研究中,特征提取方法为188D(全局蛋白质序列描述符,GPSD)。使用六种常见分类器,包括随机森林(RF)、朴素贝叶斯(NB)、支持向量机(SVM)、XGBoost、逻辑回归(LR)和决策树(DT)进行水通道蛋白分类。分类结果表明,随机森林(RF)算法是最合适的机器学习算法,准确率为97.689%。使用方差分析(ANOVA)来分析这些特征。基于ANOVA方法和IFS策略的特征排序被应用于搜索最优特征。分类结果表明,第26个特征(中性/疏水性)和第21个特征(疏水性)是区分水通道蛋白与非水通道蛋白的两个最强大且信息丰富的特征。先前的研究报道质膜蛋白具有疏水性特征。水通道蛋白亚细胞定位预测表明,所有水通道蛋白都是具有高度保守跨膜结构的质膜蛋白。此外,水通道蛋白的三维结构与定位结果一致。因此,这些研究证实水通道蛋白具有疏水性。尽管水通道蛋白具有高度保守的跨膜结构,但系统发育树显示了水通道蛋白在进化过程中的多样性。主成分分析(PCA)表明,正负样本通过54D特征得到了很好的分离,这表明54D特征可以有效地对水通道蛋白进行分类。在线预测服务器可在http://lab.malab.cn/∼acy/iAQP访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8720/8844512/60d682df804f/fcell-10-845622-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验