利用随机森林识别氨基酸的关键物理化学性质,以区分抗癌肽和非抗癌肽。
Using the Random Forest for Identifying Key Physicochemical Properties of Amino Acids to Discriminate Anticancer and Non-Anticancer Peptides.
机构信息
College of Biomedical Engineering, Sichuan University, Chengdu 610065, China.
College of Life Science, Sichuan University, Chengdu 610065, China.
出版信息
Int J Mol Sci. 2023 Jun 29;24(13):10854. doi: 10.3390/ijms241310854.
Anticancer peptides (ACPs) represent a promising new therapeutic approach in cancer treatment. They can target cancer cells without affecting healthy tissues or altering normal physiological functions. Machine learning algorithms have increasingly been utilized for predicting peptide sequences with potential ACP effects. This study analyzed four benchmark datasets based on a well-established random forest (RF) algorithm. The peptide sequences were converted into 566 physicochemical features extracted from the amino acid index (AAindex) library, which were then subjected to feature selection using four methods: light gradient-boosting machine (LGBM), analysis of variance (ANOVA), chi-squared test (Chi), and mutual information (MI). Presenting and merging the identified features using Venn diagrams, 19 key amino acid physicochemical properties were identified that can be used to predict the likelihood of a peptide sequence functioning as an ACP. The results were quantified by performance evaluation metrics to determine the accuracy of predictions. This study aims to enhance the efficiency of designing peptide sequences for cancer treatment.
抗癌肽 (ACPs) 代表了癌症治疗中一种有前途的新治疗方法。它们可以靶向癌细胞,而不会影响健康组织或改变正常的生理功能。机器学习算法越来越多地被用于预测具有潜在 ACP 效应的肽序列。本研究基于一种成熟的随机森林 (RF) 算法,对四个基准数据集进行了分析。将肽序列转换为从氨基酸指数 (AAindex) 库中提取的 566 种理化特征,然后使用四种方法(轻梯度提升机 (LGBM)、方差分析 (ANOVA)、卡方检验 (Chi) 和互信息 (MI))进行特征选择。通过 Venn 图展示和合并鉴定出的特征,确定了 19 个关键的氨基酸理化性质,可以用于预测肽序列作为 ACP 发挥作用的可能性。通过性能评估指标对结果进行量化,以确定预测的准确性。本研究旨在提高设计用于癌症治疗的肽序列的效率。