Suppr超能文献

iProps:一款具有自动机器学习能力和模型解释能力的蛋白质分类与分析综合软件工具。

iProps: A Comprehensive Software Tool for Protein Classification and Analysis With Automatic Machine Learning Capabilities and Model Interpretation Capabilities.

作者信息

Feng Changli, Wei Haiyan, Xu Chugui, Feng Bin, Zhu Xiaorong, Liu Jing, Zou Quan

出版信息

IEEE J Biomed Health Inform. 2024 Oct;28(10):6237-6247. doi: 10.1109/JBHI.2024.3425716. Epub 2024 Oct 3.

Abstract

Protein classification is a crucial field in bioinformatics. The development of a comprehensive tool that can perform feature evaluation, visualization, automated machine learning, and model interpretation would significantly advance research in protein classification. However, there is a significant gap in the literature regarding tools that integrate all these essential functionalities. This paper presents iProps, a novel Python-based software package, meticulously crafted to fulfill these multifaceted requirements. iProps is distinguished by its proficiency in feature extraction, evaluation, automated machine learning, and interpretation of classification models. Firstly, iProps fully leverages evolutionary information and amino acid reduction information to propose or extend several numerical protein features that are independent of sequence length, including SC-PSSM, ORDip, TRC, CTDC-E, CKSAAGP-E, and so forth; at the same time, it also implements the calculation of 17 other numerical features within the software. iProps also provides feature combination operations for the aforementioned features to generate more hybrid features, and has added data balancing sampling processing as well as built-in classifier settings, among other functionalities. Thus, It can discern the most effective protein class recognition feature from a multitude of candidates, utilizing three automated machine learning algorithms to identify the most optimal classifiers and parameter settings. Furthermore, iProps generates a detailed explanatory report that includes 23 informative graphs derived from three interpretable models. To assess the performance of iProps, a series of numerical experiments were conducted using two well-established datasets. The results demonstrated that our software achieved superior recognition performance in every case. Beyond its contributions to bioinformatics, iProps broadens its applicability by offering robust data analysis tools that are beneficial across various disciplines, capitalizing on its automated machine learning and model interpretation capabilities. As an open-source platform, iProps is readily accessible and features an intuitive user interface, ensuring ease of use for individuals, even those without a background in programming.

摘要

蛋白质分类是生物信息学中的一个关键领域。开发一个能够执行特征评估、可视化、自动化机器学习和模型解释的综合工具,将显著推动蛋白质分类研究。然而,关于整合所有这些基本功能的工具,文献中存在显著空白。本文介绍了iProps,这是一个基于Python的新型软件包,精心设计以满足这些多方面的要求。iProps的特点在于其在特征提取、评估、自动化机器学习以及分类模型解释方面的能力。首先,iProps充分利用进化信息和氨基酸简约信息,提出或扩展了几个与序列长度无关的数值型蛋白质特征,包括SC-PSSM、ORDip、TRC、CTDC-E、CKSAAGP-E等;同时,它还在软件中实现了其他17个数值型特征的计算。iProps还为上述特征提供了特征组合操作,以生成更多混合特征,并添加了数据平衡采样处理以及内置分类器设置等功能。因此,它能够从众多候选特征中辨别出最有效的蛋白质类别识别特征,利用三种自动化机器学习算法来确定最优分类器和参数设置。此外,iProps生成一份详细的解释报告,其中包括从三个可解释模型得出的23个信息丰富的图表。为了评估iProps的性能,使用两个成熟的数据集进行了一系列数值实验。结果表明,我们的软件在每种情况下都取得了卓越的识别性能。除了对生物信息学的贡献外,iProps凭借其自动化机器学习和模型解释能力,提供了强大的数据分析工具,从而扩大了其在各个学科中的适用性。作为一个开源平台,iProps易于访问,具有直观的用户界面,确保即使是没有编程背景的人也能轻松使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验