Suppr超能文献

使用具有特征选择的随机森林方法识别和预测羊毛硫抗生素中由羊毛硫氨酸和β-甲基羊毛硫氨酸形成的硫醚桥。

Recognizing and Predicting Thioether Bridges Formed by Lanthionine and β-Methyllanthionine in Lantibiotics Using a Random Forest Approach with Feature Selection.

作者信息

Wang ShaoPeng, Zhang Yu-Hang, Zhang Ning, Chen Lei, Huang Tao, Cai Yu-Dong

机构信息

School of Life Sciences, Shanghai University, Shanghai 200444. China.

Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031. China.

出版信息

Comb Chem High Throughput Screen. 2017;20(7):582-593. doi: 10.2174/1386207320666170310115754.

Abstract

BACKGROUND

Lantibiotics, which are usually produced from Gram-positive bacteria, are regarded as one type of special bacteriocins. Lantibiotics have unsaturated amino acid residues formed by lanthionine (Lan) and β-methyllanthionine (MeLan) residues as a ring structure in the peptide. They are derived from the serine and threonine residues and are essential to preventing the growth of other similar strains.

METHOD

In this pioneering work, we firstly proposed a machine learning method to recognize and predict the Lan and MeLan residues in the protein sequences of lantibiotics. We adopted maximal relevance minimal redundancy (mRMR) and incremental feature selection (IFS) to select optimal features and random forest (RF) to build classifiers determining the Lan and MeLan residues. A 10- fold cross-validation test was performed on the classifiers to evaluate their predicted performances.

RESULTS

The Matthew's correlation coefficient (MCC) values for predicting the Lan and MeLan residues were 0.813 and 0.769, respectively. Our constructed RF classifiers were shown to have a reliable ability to recognize Lan and MeLan residues from lantibiotic sequences. Furthermore, three other methods, Dagging, the nearest neighbor algorithm (NNA) and sequential minimal optimization (SMO) were also utilized to build classifiers to predict Lan and MeLan residues for comparison. Analysis was also performed on the optimal features, and the relationships between the optimal features and their biological importance were provided.

CONCLUSION

The selected optimal features and analysis in this work will contribute to a better understanding of the sequence and structural features around the Lan and MeLan residues. It could provide useful information and practical suggestions for experimental and computational methods toward exploring the biological features of such special residues in lantibiotics.

摘要

背景

羊毛硫抗生素通常由革兰氏阳性菌产生,被视为一类特殊的细菌素。羊毛硫抗生素在肽中具有由羊毛硫氨酸(Lan)和β-甲基羊毛硫氨酸(MeLan)残基形成的不饱和氨基酸残基作为环结构。它们源自丝氨酸和苏氨酸残基,对于阻止其他相似菌株的生长至关重要。

方法

在这项开创性工作中,我们首先提出了一种机器学习方法来识别和预测羊毛硫抗生素蛋白质序列中的Lan和MeLan残基。我们采用最大相关最小冗余(mRMR)和增量特征选择(IFS)来选择最优特征,并使用随机森林(RF)构建确定Lan和MeLan残基的分类器。对分类器进行了10折交叉验证测试以评估其预测性能。

结果

预测Lan和MeLan残基的马修斯相关系数(MCC)值分别为0.813和0.769。我们构建的RF分类器显示出具有从羊毛硫抗生素序列中识别Lan和MeLan残基的可靠能力。此外,还利用另外三种方法,即装袋法、最近邻算法(NNA)和序列最小优化(SMO)来构建分类器以预测Lan和MeLan残基进行比较。还对最优特征进行了分析,并提供了最优特征与其生物学重要性之间的关系。

结论

本研究中选择的最优特征和分析将有助于更好地理解Lan和MeLan残基周围的序列和结构特征。它可以为探索羊毛硫抗生素中此类特殊残基生物学特征的实验和计算方法提供有用信息和实用建议。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验