Suppr超能文献

蛋白质属性对晕轮稳定性的贡献,生物信息学方法。

Protein attributes contribute to halo-stability, bioinformatics approach.

作者信息

Ebrahimie Esmaeil, Ebrahimi Mansour, Sarvestani Narjes Rahpayma, Ebrahimi Mahdi

机构信息

Bioinformatics Research Group, Green Research Center, Qom University, Qom, Iran.

出版信息

Saline Syst. 2011 May 18;7(1):1. doi: 10.1186/1746-1448-7-1.

Abstract

Halophile proteins can tolerate high salt concentrations. Understanding halophilicity features is the first step toward engineering halostable crops. To this end, we examined protein features contributing to the halo-toleration of halophilic organisms. We compared more than 850 features for halophilic and non-halophilic proteins with various screening, clustering, decision tree, and generalized rule induction models to search for patterns that code for halo-toleration. Up to 251 protein attributes selected by various attribute weighting algorithms as important features contribute to halo-stability; from them 14 attributes selected by 90% of models and the count of hydrogen gained the highest value (1.0) in 70% of attribute weighting models, showing the importance of this attribute in feature selection modeling. The other attributes mostly were the frequencies of di-peptides. No changes were found in the numbers of groups when K-Means and TwoStep clustering modeling were performed on datasets with or without feature selection filtering. Although the depths of induced trees were not high, the accuracies of trees were higher than 94% and the frequency of hydrophobic residues pointed as the most important feature to build trees. The performance evaluation of decision tree models had the same values and the best correctness percentage recorded with the Exhaustive CHAID and CHAID models. We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models with or without feature selection. For the first time, we analyzed the performance of different screening, clustering, and decision tree algorithms for discriminating halophilic and non-halophilic proteins and the results showed that amino acid composition can be used to discriminate between halo-tolerant and halo-sensitive proteins.

摘要

嗜盐蛋白能够耐受高盐浓度。了解嗜盐特性是培育耐盐作物的第一步。为此,我们研究了有助于嗜盐生物耐受高盐的蛋白质特性。我们使用各种筛选、聚类、决策树和广义规则归纳模型,比较了850多种嗜盐和非嗜盐蛋白质的特性,以寻找编码耐盐性的模式。多达251种通过各种属性加权算法选择的蛋白质属性作为重要特征有助于提高盐稳定性;其中14种属性被90%的模型选中,并且氢原子数在70%的属性加权模型中获得了最高值(1.0),表明该属性在特征选择建模中的重要性。其他属性大多是二肽的频率。在对有无特征选择过滤的数据集进行K-Means和两步聚类建模时,组的数量没有变化。虽然诱导树的深度不高,但树的准确率高于94%,疏水残基的频率被指出是构建树的最重要特征。决策树模型的性能评估具有相同的值,使用穷举CHAID和CHAID模型记录的正确率最高。我们没有发现有无特征选择的各种决策树模型在正确率百分比、性能评估和平均正确率方面存在任何显著差异。我们首次分析了不同筛选、聚类和决策树算法区分嗜盐和非嗜盐蛋白质的性能,结果表明氨基酸组成可用于区分耐盐和盐敏感蛋白质。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0519/3117752/8a681c19044a/1746-1448-7-1-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验