Suppr超能文献

用于蛋白质二级结构预测的多级组合分类器增强模型。

Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction.

作者信息

Zhang Xu, Liu Yiwei, Wang Yaming, Zhang Liang, Feng Lin, Jin Bo, Zhang Hongzhe

机构信息

College of Mechanical Engineering, Dalian University of Technology, Dalian, China.

School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, China.

出版信息

Front Genet. 2022 May 23;13:769828. doi: 10.3389/fgene.2022.769828. eCollection 2022.

Abstract

In the field of bioinformatics, understanding protein secondary structure is very important for exploring diseases and finding new treatments. Considering that the physical experiment-based protein secondary structure prediction methods are time-consuming and expensive, some pattern recognition and machine learning methods are proposed. However, most of the methods achieve quite similar performance, which seems to reach a model capacity bottleneck. As both model design and learning process can affect the model learning capacity, we pay attention to the latter part. To this end, a framework called Multistage Combination Classifier Augmented Model (MCCM) is proposed to solve the protein secondary structure prediction task. Specifically, first, a feature extraction module is introduced to extract features with different levels of learning difficulties. Second, multistage combination classifiers are proposed to learn decision boundaries for easy and hard samples, respectively, with the latter penalizing the loss value of the hard samples and finally improving the prediction performance of hard samples. Third, based on the Dirichlet distribution and information entropy measurement, a sample difficulty discrimination module is designed to assign samples with different learning difficulty levels to the aforementioned classifiers. The experimental results on the publicly available benchmark CB513 dataset show that our method outperforms most state-of-the-art models.

摘要

在生物信息学领域,理解蛋白质二级结构对于探索疾病和寻找新的治疗方法非常重要。鉴于基于物理实验的蛋白质二级结构预测方法既耗时又昂贵,因此提出了一些模式识别和机器学习方法。然而,大多数方法的性能相当相似,这似乎达到了模型能力瓶颈。由于模型设计和学习过程都会影响模型的学习能力,我们关注后者。为此,提出了一种名为多阶段组合分类器增强模型(MCCM)的框架来解决蛋白质二级结构预测任务。具体来说,首先,引入一个特征提取模块来提取具有不同学习难度水平的特征。其次,提出多阶段组合分类器,分别为简单样本和困难样本学习决策边界,后者对困难样本的损失值进行惩罚,最终提高困难样本的预测性能。第三,基于狄利克雷分布和信息熵度量,设计了一个样本难度判别模块,将具有不同学习难度水平的样本分配给上述分类器。在公开可用的基准CB513数据集上的实验结果表明,我们的方法优于大多数最先进的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cab5/9170271/0d41a2fd6abd/fgene-13-769828-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验