• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

iProps:一款具有自动机器学习能力和模型解释能力的蛋白质分类与分析综合软件工具。

iProps: A Comprehensive Software Tool for Protein Classification and Analysis With Automatic Machine Learning Capabilities and Model Interpretation Capabilities.

作者信息

Feng Changli, Wei Haiyan, Xu Chugui, Feng Bin, Zhu Xiaorong, Liu Jing, Zou Quan

出版信息

IEEE J Biomed Health Inform. 2024 Oct;28(10):6237-6247. doi: 10.1109/JBHI.2024.3425716. Epub 2024 Oct 3.

DOI:10.1109/JBHI.2024.3425716
PMID:39008396
Abstract

Protein classification is a crucial field in bioinformatics. The development of a comprehensive tool that can perform feature evaluation, visualization, automated machine learning, and model interpretation would significantly advance research in protein classification. However, there is a significant gap in the literature regarding tools that integrate all these essential functionalities. This paper presents iProps, a novel Python-based software package, meticulously crafted to fulfill these multifaceted requirements. iProps is distinguished by its proficiency in feature extraction, evaluation, automated machine learning, and interpretation of classification models. Firstly, iProps fully leverages evolutionary information and amino acid reduction information to propose or extend several numerical protein features that are independent of sequence length, including SC-PSSM, ORDip, TRC, CTDC-E, CKSAAGP-E, and so forth; at the same time, it also implements the calculation of 17 other numerical features within the software. iProps also provides feature combination operations for the aforementioned features to generate more hybrid features, and has added data balancing sampling processing as well as built-in classifier settings, among other functionalities. Thus, It can discern the most effective protein class recognition feature from a multitude of candidates, utilizing three automated machine learning algorithms to identify the most optimal classifiers and parameter settings. Furthermore, iProps generates a detailed explanatory report that includes 23 informative graphs derived from three interpretable models. To assess the performance of iProps, a series of numerical experiments were conducted using two well-established datasets. The results demonstrated that our software achieved superior recognition performance in every case. Beyond its contributions to bioinformatics, iProps broadens its applicability by offering robust data analysis tools that are beneficial across various disciplines, capitalizing on its automated machine learning and model interpretation capabilities. As an open-source platform, iProps is readily accessible and features an intuitive user interface, ensuring ease of use for individuals, even those without a background in programming.

摘要

蛋白质分类是生物信息学中的一个关键领域。开发一个能够执行特征评估、可视化、自动化机器学习和模型解释的综合工具,将显著推动蛋白质分类研究。然而,关于整合所有这些基本功能的工具,文献中存在显著空白。本文介绍了iProps,这是一个基于Python的新型软件包,精心设计以满足这些多方面的要求。iProps的特点在于其在特征提取、评估、自动化机器学习以及分类模型解释方面的能力。首先,iProps充分利用进化信息和氨基酸简约信息,提出或扩展了几个与序列长度无关的数值型蛋白质特征,包括SC-PSSM、ORDip、TRC、CTDC-E、CKSAAGP-E等;同时,它还在软件中实现了其他17个数值型特征的计算。iProps还为上述特征提供了特征组合操作,以生成更多混合特征,并添加了数据平衡采样处理以及内置分类器设置等功能。因此,它能够从众多候选特征中辨别出最有效的蛋白质类别识别特征,利用三种自动化机器学习算法来确定最优分类器和参数设置。此外,iProps生成一份详细的解释报告,其中包括从三个可解释模型得出的23个信息丰富的图表。为了评估iProps的性能,使用两个成熟的数据集进行了一系列数值实验。结果表明,我们的软件在每种情况下都取得了卓越的识别性能。除了对生物信息学的贡献外,iProps凭借其自动化机器学习和模型解释能力,提供了强大的数据分析工具,从而扩大了其在各个学科中的适用性。作为一个开源平台,iProps易于访问,具有直观的用户界面,确保即使是没有编程背景的人也能轻松使用。

相似文献

1
iProps: A Comprehensive Software Tool for Protein Classification and Analysis With Automatic Machine Learning Capabilities and Model Interpretation Capabilities.iProps:一款具有自动机器学习能力和模型解释能力的蛋白质分类与分析综合软件工具。
IEEE J Biomed Health Inform. 2024 Oct;28(10):6237-6247. doi: 10.1109/JBHI.2024.3425716. Epub 2024 Oct 3.
2
FEPS: A Tool for Feature Extraction from Protein Sequence.FEPS:一种从蛋白质序列中提取特征的工具。
Methods Mol Biol. 2022;2499:65-104. doi: 10.1007/978-1-0716-2317-6_3.
3
ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data.ShinyLearner:一个用于表格数据机器学习分类的容器化基准测试工具。
Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa026.
4
Automated feature engineering improves prediction of protein-protein interactions.自动化特征工程提高蛋白质-蛋白质相互作用预测的准确性。
Amino Acids. 2019 Aug;51(8):1187-1200. doi: 10.1007/s00726-019-02756-9. Epub 2019 Jul 5.
5
Automated alphabet reduction for protein datasets.蛋白质数据集的自动字母缩减
BMC Bioinformatics. 2009 Jan 6;10:6. doi: 10.1186/1471-2105-10-6.
6
iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.iLearnPlus:一个全面的、自动化的机器学习平台,用于核酸和蛋白质序列分析、预测和可视化。
Nucleic Acids Res. 2021 Jun 4;49(10):e60. doi: 10.1093/nar/gkab122.
7
Machine learning: an indispensable tool in bioinformatics.机器学习:生物信息学中不可或缺的工具。
Methods Mol Biol. 2010;593:25-48. doi: 10.1007/978-1-60327-194-3_2.
8
A new machine learning based user-friendly software platform for automatic radiomics modeling and analysis.一种基于机器学习的用户友好型软件平台,用于自动进行放射组学建模和分析。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2810-2814. doi: 10.1109/EMBC46164.2021.9630472.
9
POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles.POSSUM:一种基于位置特异性得分矩阵(PSSM)谱生成数字序列特征描述符的生物信息学工具包。
Bioinformatics. 2017 Sep 1;33(17):2756-2758. doi: 10.1093/bioinformatics/btx302.
10
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.