Suppr超能文献

基于序列的蛋白质结晶倾向预测模型,使用机器学习和两级特征选择。

Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection.

机构信息

Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan.

AIBioMed Research Group, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan.

出版信息

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad319.

Abstract

Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (${\chi }^{2}$) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.

摘要

蛋白质结晶对于生物学至关重要,但涉及的步骤在外部因素和内部结构方面都很复杂且要求很高。为了节省实验成本和时间,可以通过建模来初步确定和筛选蛋白质结晶的趋势。因此,这项研究创建了一个新的管道,旨在使用蛋白质序列在蛋白质材料生产阶段、纯化阶段和晶体生产阶段预测蛋白质结晶倾向。新创建的管道提出了一种新的特征选择方法,该方法涉及将卡方(${\chi }^{2}$)和递归特征消除与 12 个选定特征相结合,然后使用线性判别分析进行降维,最后使用支持向量机算法进行超参数调整和 10 倍交叉验证来训练模型并测试结果。该新管道已在三个不同的数据集上进行了测试,准确率高于现有管道。总之,我们的模型为预测多阶段蛋白质结晶倾向提供了一个新的解决方案,这是计算生物学中的一个重大挑战。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验