Suppr超能文献

StackedEnC-AOP:基于多尺度向量的转换进化和序列特征与堆叠集成学习预测抗氧化蛋白。

StackedEnC-AOP: prediction of antioxidant proteins using transform evolutionary and sequential features based multi-scale vector with stacked ensemble learning.

机构信息

Department of Zoology, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.

出版信息

BMC Bioinformatics. 2024 Aug 4;25(1):256. doi: 10.1186/s12859-024-05884-6.

Abstract

BACKGROUND

Antioxidant proteins are involved in several biological processes and can protect DNA and cells from the damage of free radicals. These proteins regulate the body's oxidative stress and perform a significant role in many antioxidant-based drugs. The current invitro-based medications are costly, time-consuming, and unable to efficiently screen and identify the targeted motif of antioxidant proteins.

METHODS

In this model, we proposed an accurate prediction method to discriminate antioxidant proteins namely StackedEnC-AOP. The training sequences are formulation encoded via incorporating a discrete wavelet transform (DWT) into the evolutionary matrix to decompose the PSSM-based images via two levels of DWT to form a Pseudo position-specific scoring matrix (PsePSSM-DWT) based embedded vector. Additionally, the Evolutionary difference formula and composite physiochemical properties methods are also employed to collect the structural and sequential descriptors. Then the combined vector of sequential features, evolutionary descriptors, and physiochemical properties is produced to cover the flaws of individual encoding schemes. To reduce the computational cost of the combined features vector, the optimal features are chosen using Minimum redundancy and maximum relevance (mRMR). The optimal feature vector is trained using a stacking-based ensemble meta-model.

RESULTS

Our developed StackedEnC-AOP method reported a prediction accuracy of 98.40% and an AUC of 0.99 via training sequences. To evaluate model validation, the StackedEnC-AOP training model using an independent set achieved an accuracy of 96.92% and an AUC of 0.98.

CONCLUSION

Our proposed StackedEnC-AOP strategy performed significantly better than current computational models with a ~ 5% and ~ 3% improved accuracy via training and independent sets, respectively. The efficacy and consistency of our proposed StackedEnC-AOP make it a valuable tool for data scientists and can execute a key role in research academia and drug design.

摘要

背景

抗氧化蛋白参与多种生物学过程,可以保护 DNA 和细胞免受自由基的损伤。这些蛋白质调节体内的氧化应激,在许多基于抗氧化剂的药物中发挥重要作用。目前基于体外的药物昂贵、耗时,并且不能有效地筛选和识别抗氧化蛋白的靶向基序。

方法

在本模型中,我们提出了一种准确的预测方法来区分抗氧化蛋白,即 StackedEnC-AOP。训练序列通过将离散小波变换 (DWT) 纳入进化矩阵中进行编码,通过两级 DWT 对基于 PSSM 的图像进行分解,形成基于伪位置特异性评分矩阵 (PsePSSM-DWT) 的嵌入式向量。此外,还采用进化差异公式和复合理化性质方法收集结构和序列描述符。然后,生成序列特征、进化描述符和理化性质的组合向量,以弥补单个编码方案的缺陷。为了降低组合特征向量的计算成本,使用最小冗余最大相关性 (mRMR) 选择最优特征。使用基于堆叠的集成元模型对最优特征向量进行训练。

结果

我们开发的 StackedEnC-AOP 方法在训练序列中报告了 98.40%的预测精度和 0.99 的 AUC。为了评估模型验证,StackedEnC-AOP 训练模型使用独立集实现了 96.92%的准确率和 0.98 的 AUC。

结论

与当前的计算模型相比,我们提出的 StackedEnC-AOP 策略在训练集和独立集上的准确率分别提高了约 5%和 3%,表现出显著的优势。我们提出的 StackedEnC-AOP 的功效和一致性使其成为数据科学家的有价值工具,并可以在研究学术界和药物设计中发挥关键作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca38/11298090/1c83a3aaf923/12859_2024_5884_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验