• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用负样本空间提高细胞因子-受体相互作用预测

Improved cytokine-receptor interaction prediction by exploiting the negative sample space.

机构信息

Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India.

Department of Genetics, Department of Cell Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA.

出版信息

BMC Bioinformatics. 2020 Oct 31;21(1):493. doi: 10.1186/s12859-020-03835-5.

DOI:10.1186/s12859-020-03835-5
PMID:33129275
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7603689/
Abstract

BACKGROUND

Cytokines act by binding to specific receptors in the plasma membrane of target cells. Knowledge of cytokine-receptor interaction (CRI) is very important for understanding the pathogenesis of various human diseases-notably autoimmune, inflammatory and infectious diseases-and identifying potential therapeutic targets. Recently, machine learning algorithms have been used to predict CRIs. "Gold Standard" negative datasets are still lacking and strong biases in negative datasets can significantly affect the training of learning algorithms and their evaluation. To mitigate the unrepresentativeness and bias inherent in the negative sample selection (non-interacting proteins), we propose a clustering-based approach for representative negative sample selection.

RESULTS

We used deep autoencoders to investigate the effect of different sampling approaches for non-interacting pairs on the training and the performance of machine learning classifiers. By using the anomaly detection capabilities of deep autoencoders we deduced the effects of different categories of negative samples on the training of learning algorithms. Random sampling for selecting non-interacting pairs results in either over- or under-representation of hard or easy to classify instances. When K-means based sampling of negative datasets is applied to mitigate the inadequacies of random sampling, random forest (RF) together with the combined feature set of atomic composition, physicochemical-2grams and two different representations of evolutionary information performs best. Average model performances based on leave-one-out cross validation (loocv) over ten different negative sample sets that each model was trained with, show that RF models significantly outperform the previous best CRI predictor in terms of accuracy (+ 5.1%), specificity (+ 13%), mcc (+ 0.1) and g-means value (+ 5.1). Evaluations using tenfold cv and training/testing splits confirm the competitive performance.

CONCLUSIONS

A comparative analysis was performed to assess the effect of three different sampling methods (random, K-means and uniform sampling) on the training of learning algorithms using different evaluation methods. Models trained on K-means sampled datasets generally show a significantly improved performance compared to those trained on random selections-with RF seemingly benefiting most in our particular setting. Our findings on the sampling are highly relevant and apply to many applications of supervised learning approaches in bioinformatics.

摘要

背景

细胞因子通过与靶细胞质膜上的特定受体结合而发挥作用。细胞因子-受体相互作用(CRI)的知识对于理解各种人类疾病的发病机制非常重要,特别是自身免疫、炎症和传染病,并确定潜在的治疗靶点。最近,机器学习算法已被用于预测 CRI。“金标准”阴性数据集仍然缺乏,而阴性数据集的强烈偏差会显著影响学习算法的训练及其评估。为了减轻阴性样本选择(非相互作用蛋白)中固有的代表性不足和偏差,我们提出了一种基于聚类的代表性阴性样本选择方法。

结果

我们使用深度自动编码器来研究不同的采样方法对非相互作用对的训练和机器学习分类器性能的影响。通过使用深度自动编码器的异常检测能力,我们推断出不同类别的阴性样本对学习算法训练的影响。随机采样选择非相互作用对会导致难以或易于分类的实例过度或不足代表。当应用基于 K-均值的阴性数据集采样来减轻随机采样的不足时,随机森林(RF)与原子组成、物理化学-2 克和两种不同进化信息表示形式的组合特征集相结合,表现最佳。基于 10 个不同的阴性样本集的留一交叉验证(loocv)的平均模型性能,每个模型都用其进行训练,表明 RF 模型在准确性(+5.1%)、特异性(+13%)、mcc(+0.1)和 g-均值值(+5.1%)方面明显优于以前最好的 CRI 预测器。使用 10 倍 cv 和训练/测试分割的评估确认了竞争性能。

结论

进行了比较分析,以使用不同的评估方法评估三种不同采样方法(随机、K-均值和均匀采样)对学习算法训练的影响。与随机选择相比,在 K-均值采样数据集上训练的模型通常表现出显著提高的性能-似乎在我们的特定环境中 RF 受益最大。我们对采样的发现非常相关,并且适用于生物信息学中许多监督学习方法的应用。

相似文献

1
Improved cytokine-receptor interaction prediction by exploiting the negative sample space.利用负样本空间提高细胞因子-受体相互作用预测
BMC Bioinformatics. 2020 Oct 31;21(1):493. doi: 10.1186/s12859-020-03835-5.
2
Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.基于机器学习的蛋白质-RNA 界面残基预测:现状评估。
BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.
3
Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用:以新生儿呼吸暂停预测为例的研究
Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.
4
DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.DP-BINDER:一种通过融合进化和物理化学信息来预测 DNA 结合蛋白的机器学习模型。
J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.
5
A multicenter random forest model for effective prognosis prediction in collaborative clinical research network.多中心随机森林模型在协作临床研究网络中的有效预后预测。
Artif Intell Med. 2020 Mar;103:101814. doi: 10.1016/j.artmed.2020.101814. Epub 2020 Feb 5.
6
Enhanced prediction of recombination hotspots using input features extracted by class specific autoencoders.使用特定类别自动编码器提取的输入特征增强重组热点预测。
J Theor Biol. 2018 May 7;444:73-82. doi: 10.1016/j.jtbi.2018.02.016. Epub 2018 Feb 17.
7
Improved Prediction of Protein-Protein Interaction Mapping on by Using Amino Acid Sequence Features in a Supervised Learning Framework.利用监督学习框架中的氨基酸序列特征改进蛋白质相互作用预测映射。
Protein Pept Lett. 2021;28(1):74-83. doi: 10.2174/0929866527666200610141258.
8
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
9
A novel machine learning method for cytokine-receptor interaction prediction.一种用于细胞因子-受体相互作用预测的新型机器学习方法。
Comb Chem High Throughput Screen. 2016;19(2):144-52. doi: 10.2174/1386207319666151110122621.
10
Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.通过平衡且多样化的训练集和决策融合实现脂蛋白预测最大化。
Comput Biol Chem. 2015 Dec;59 Pt A:101-10. doi: 10.1016/j.compbiolchem.2015.09.011. Epub 2015 Sep 28.

引用本文的文献

1
off-target profiling for enhanced drug safety assessment.用于增强药物安全性评估的脱靶分析
Acta Pharm Sin B. 2024 Jul;14(7):2927-2941. doi: 10.1016/j.apsb.2024.03.002. Epub 2024 Mar 6.
2
Heat shock protein family A member 8 is a prognostic marker for bladder cancer: Evidences based on experiments and machine learning.热休克蛋白家族 A 成员 8 是膀胱癌的预后标志物:基于实验和机器学习的证据。
J Cell Mol Med. 2023 Dec;27(24):3995-4008. doi: 10.1111/jcmm.17977. Epub 2023 Sep 28.
3
Mining Chemogenomic Spaces for Prediction of Drug-Target Interactions.挖掘化学生物组学空间以预测药物-靶标相互作用。
Methods Mol Biol. 2024;2714:155-169. doi: 10.1007/978-1-0716-3441-7_9.
4
Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context.多组学整合背景下蛋白质-蛋白质相互作用网络的表征与可视化方法概述。
Front Mol Biosci. 2022 Sep 8;9:962799. doi: 10.3389/fmolb.2022.962799. eCollection 2022.
5
Protein-protein interaction and non-interaction predictions using gene sequence natural vector.利用基因序列自然向量进行蛋白质-蛋白质相互作用和非相互作用预测。
Commun Biol. 2022 Jul 2;5(1):652. doi: 10.1038/s42003-022-03617-0.
6
Modelling the bioinformatics tertiary analysis research process.建立生物信息学三级分析研究过程模型。
BMC Bioinformatics. 2021 Sep 30;22(Suppl 13):452. doi: 10.1186/s12859-021-04310-5.
7
Patient-Specific Cell Communication Networks Associate With Disease Progression in Cancer.特定患者的细胞通讯网络与癌症疾病进展相关。
Front Genet. 2021 Aug 27;12:667382. doi: 10.3389/fgene.2021.667382. eCollection 2021.

本文引用的文献

1
Alpha influenza virus infiltration prediction using virus-human protein-protein interaction network.利用病毒-人类蛋白质-蛋白质相互作用网络预测甲型流感病毒浸润
Math Biosci Eng. 2020 Apr 15;17(4):3109-3129. doi: 10.3934/mbe.2020176.
2
Inferring interaction partners from protein sequences using mutual information.利用互信息从蛋白质序列推断相互作用的伙伴。
PLoS Comput Biol. 2018 Nov 13;14(11):e1006401. doi: 10.1371/journal.pcbi.1006401. eCollection 2018 Nov.
3
Enhanced prediction of recombination hotspots using input features extracted by class specific autoencoders.使用特定类别自动编码器提取的输入特征增强重组热点预测。
J Theor Biol. 2018 May 7;444:73-82. doi: 10.1016/j.jtbi.2018.02.016. Epub 2018 Feb 17.
4
Sequence-based prediction of protein protein interaction using a deep-learning algorithm.使用深度学习算法基于序列预测蛋白质-蛋白质相互作用
BMC Bioinformatics. 2017 May 25;18(1):277. doi: 10.1186/s12859-017-1700-2.
5
Non-interacting proteins may resemble interacting proteins: prevalence and implications.非相互作用的蛋白质可能类似于相互作用的蛋白质:普遍性及影响。
Sci Rep. 2017 Jan 13;7:40419. doi: 10.1038/srep40419.
6
Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis.通过直接耦合分析同时鉴定特异性相互作用的旁系同源物和蛋白质间相互作用位点
Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12186-12191. doi: 10.1073/pnas.1607570113. Epub 2016 Oct 11.
7
Unsupervised learning assisted robust prediction of bioluminescent proteins.无监督学习辅助的生物发光蛋白稳健预测
Comput Biol Med. 2016 Jan 1;68:27-36. doi: 10.1016/j.compbiomed.2015.10.013. Epub 2015 Nov 10.
8
A novel machine learning method for cytokine-receptor interaction prediction.一种用于细胞因子-受体相互作用预测的新型机器学习方法。
Comb Chem High Throughput Screen. 2016;19(2):144-52. doi: 10.2174/1386207319666151110122621.
9
Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.通过平衡且多样化的训练集和决策融合实现脂蛋白预测最大化。
Comput Biol Chem. 2015 Dec;59 Pt A:101-10. doi: 10.1016/j.compbiolchem.2015.09.011. Epub 2015 Sep 28.
10
Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest.使用一种新颖的多尺度局部特征表示方案和随机森林从蛋白质一级序列预测蛋白质-蛋白质相互作用。
PLoS One. 2015 May 6;10(5):e0125811. doi: 10.1371/journal.pone.0125811. eCollection 2015.