Suppr超能文献

通过集成剪枝进行无监督编码选择用于生物医学分类。

Unsupervised encoding selection through ensemble pruning for biomedical classification.

作者信息

Spänig Sebastian, Michel Alexander, Heider Dominik

机构信息

Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.

出版信息

BioData Min. 2023 Mar 16;16(1):10. doi: 10.1186/s13040-022-00317-7.

Abstract

BACKGROUND

Owing to the rising levels of multi-resistant pathogens, antimicrobial peptides, an alternative strategy to classic antibiotics, got more attention. A crucial part is thereby the costly identification and validation. With the ever-growing amount of annotated peptides, researchers leverage artificial intelligence to circumvent the cumbersome, wet-lab-based identification and automate the detection of promising candidates. However, the prediction of a peptide's function is not limited to antimicrobial efficiency. To date, multiple studies successfully classified additional properties, e.g., antiviral or cell-penetrating effects. In this light, ensemble classifiers are employed aiming to further improve the prediction. Although we recently presented a workflow to significantly diminish the initial encoding choice, an entire unsupervised encoding selection, considering various machine learning models, is still lacking.

RESULTS

We developed a workflow, automatically selecting encodings and generating classifier ensembles by employing sophisticated pruning methods. We observed that the Pareto frontier pruning is a good method to create encoding ensembles for the datasets at hand. In addition, encodings combined with the Decision Tree classifier as the base model are often superior. However, our results also demonstrate that none of the ensemble building techniques is outstanding for all datasets.

CONCLUSION

The workflow conducts multiple pruning methods to evaluate ensemble classifiers composed from a wide range of peptide encodings and base models. Consequently, researchers can use the workflow for unsupervised encoding selection and ensemble creation. Ultimately, the extensible workflow can be used as a plugin for the PEPTIDE REACToR, further establishing it as a versatile tool in the domain.

摘要

背景

由于多重耐药病原体的增多,抗菌肽作为经典抗生素的替代策略受到了更多关注。其中一个关键部分是成本高昂的鉴定和验证。随着注释肽数量的不断增加,研究人员利用人工智能来规避基于湿实验室的繁琐鉴定,并实现有前景候选物检测的自动化。然而,肽功能的预测并不局限于抗菌效率。迄今为止,多项研究成功地对其他特性进行了分类,例如抗病毒或细胞穿透作用。有鉴于此,人们采用集成分类器旨在进一步提高预测效果。尽管我们最近提出了一种工作流程,可显著减少初始编码选择,但仍缺乏一种完整的无监督编码选择方法,该方法需考虑各种机器学习模型。

结果

我们开发了一种工作流程,通过采用复杂的剪枝方法自动选择编码并生成分类器集成。我们观察到帕累托前沿剪枝是为手头数据集创建编码集成的一种好方法。此外,与决策树分类器作为基础模型相结合的编码通常更具优势。然而,我们的结果也表明,没有一种集成构建技术对所有数据集都表现出色。

结论

该工作流程采用多种剪枝方法来评估由广泛的肽编码和基础模型组成的集成分类器。因此,研究人员可以使用该工作流程进行无监督编码选择和集成创建。最终,这个可扩展的工作流程可以用作PEPTIDE REACToR的插件,进一步确立它在该领域作为通用工具的地位。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/10018861/f3271731989c/13040_2022_317_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验