• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过将细胞信号模型与机器学习相结合来提高基因表达预测。

Improved prediction of gene expression through integrating cell signalling models with machine learning.

机构信息

Department of Computer Science, University of Manchester, Manchester, UK.

Department of Computer Science, Taif University, Taif, Saudi Arabia.

出版信息

BMC Bioinformatics. 2022 Aug 6;23(1):323. doi: 10.1186/s12859-022-04787-8.

DOI:10.1186/s12859-022-04787-8
PMID:35933367
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9356471/
Abstract

BACKGROUND

A key problem in bioinformatics is that of predicting gene expression levels. There are two broad approaches: use of mechanistic models that aim to directly simulate the underlying biology, and use of machine learning (ML) to empirically predict expression levels from descriptors of the experiments. There are advantages and disadvantages to both approaches: mechanistic models more directly reflect the underlying biological causation, but do not directly utilize the available empirical data; while ML methods do not fully utilize existing biological knowledge.

RESULTS

Here, we investigate overcoming these disadvantages by integrating mechanistic cell signalling models with ML. Our approach to integration is to augment ML with similarity features (attributes) computed from cell signalling models. Seven sets of different similarity feature were generated using graph theory. Each set of features was in turn used to learn multi-target regression models. All the features have significantly improved accuracy over the baseline model - without the similarity features. Finally, the seven multi-target regression models were stacked together to form an overall prediction model that was significantly better than the baseline on 95% of genes on an independent test set. The similarity features enable this stacking model to provide interpretable knowledge about cancer, e.g. the role of ERBB3 in the MCF7 breast cancer cell line.

CONCLUSION

Integrating mechanistic models as graphs helps to both improve the predictive results of machine learning models, and to provide biological knowledge about genes that can help in building state-of-the-art mechanistic models.

摘要

背景

生物信息学中的一个关键问题是预测基因表达水平。有两种广泛的方法:使用旨在直接模拟基础生物学的机械模型,以及使用机器学习 (ML) 从实验描述符中经验预测表达水平。这两种方法都有优点和缺点:机械模型更直接地反映了潜在的生物学因果关系,但不能直接利用可用的经验数据;而 ML 方法没有充分利用现有的生物学知识。

结果

在这里,我们通过将机械细胞信号模型与 ML 集成来研究克服这些缺点。我们的集成方法是用从细胞信号模型计算出的相似性特征(属性)来增强 ML。使用图论生成了七组不同的相似性特征。每组特征依次用于学习多目标回归模型。所有特征的准确性都明显优于没有相似性特征的基线模型。最后,将这七个多目标回归模型堆叠在一起,形成一个整体预测模型,在独立测试集上 95%的基因上的表现明显优于基线模型。相似性特征使这种堆叠模型能够提供有关癌症的可解释知识,例如 ERBB3 在 MCF7 乳腺癌细胞系中的作用。

结论

将机械模型集成为图有助于提高机器学习模型的预测结果,并提供有关基因的生物学知识,这有助于构建最先进的机械模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aca/9356471/2a7ddeb6ab6c/12859_2022_4787_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aca/9356471/affd09922411/12859_2022_4787_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aca/9356471/51987649f577/12859_2022_4787_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aca/9356471/47bf414610d0/12859_2022_4787_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aca/9356471/2a7ddeb6ab6c/12859_2022_4787_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aca/9356471/affd09922411/12859_2022_4787_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aca/9356471/51987649f577/12859_2022_4787_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aca/9356471/47bf414610d0/12859_2022_4787_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7aca/9356471/2a7ddeb6ab6c/12859_2022_4787_Fig4_HTML.jpg

相似文献

1
Improved prediction of gene expression through integrating cell signalling models with machine learning.通过将细胞信号模型与机器学习相结合来提高基因表达预测。
BMC Bioinformatics. 2022 Aug 6;23(1):323. doi: 10.1186/s12859-022-04787-8.
2
A mechanism-aware and multiomic machine-learning pipeline characterizes yeast cell growth.一种基于机制感知和多组学机器学习的管道可对酵母细胞生长进行特征描述。
Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18869-18879. doi: 10.1073/pnas.2002959117. Epub 2020 Jul 16.
3
Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE).使用多视图因子分解自动编码器(MAE)将多组学数据与生物相互作用网络集成。
BMC Genomics. 2019 Dec 20;20(Suppl 11):944. doi: 10.1186/s12864-019-6285-x.
4
Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases.无监督构建具有显式结构归纳偏差的基因表达数据的计算图。
Bioinformatics. 2022 Feb 7;38(5):1320-1327. doi: 10.1093/bioinformatics/btab830.
5
Predicting drug-target interaction network using deep learning model.利用深度学习模型预测药物-靶标相互作用网络。
Comput Biol Chem. 2019 Jun;80:90-101. doi: 10.1016/j.compbiolchem.2019.03.016. Epub 2019 Mar 25.
6
Facilitating prediction of adverse drug reactions by using knowledge graphs and multi-label learning models.利用知识图谱和多标签学习模型促进药物不良反应预测。
Brief Bioinform. 2019 Jan 18;20(1):190-202. doi: 10.1093/bib/bbx099.
7
Mask-Guided Target Node Feature Learning and Dynamic Detailed Feature Enhancement for lncRNA-Disease Association Prediction.基于掩码的靶标节点特征学习和动态详细特征增强的 lncRNA-疾病关联预测。
J Chem Inf Model. 2024 Aug 26;64(16):6662-6675. doi: 10.1021/acs.jcim.4c00652. Epub 2024 Aug 7.
8
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
9
Machine learning approach to gene essentiality prediction: a review.机器学习在基因必需性预测中的应用:综述。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.
10
ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages.ECMarker:可解释的机器学习模型,用于识别预测临床结果的基因表达生物标志物,并揭示人类疾病早期的分子机制。
Bioinformatics. 2021 May 23;37(8):1115-1124. doi: 10.1093/bioinformatics/btaa935.

引用本文的文献

1
AUPRC: a metric for evaluating the performance of in-silico perturbation methods in identifying differentially expressed genes.AUPRC:一种用于评估计算机模拟扰动方法在识别差异表达基因方面性能的指标。
Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf426.
2
Perturbation-Theory Machine Learning for Multi-Target Drug Discovery in Modern Anticancer Research.现代抗癌研究中用于多靶点药物发现的微扰理论机器学习
Curr Issues Mol Biol. 2025 Apr 25;47(5):301. doi: 10.3390/cimb47050301.
3
iPro-CSAF: identification of promoters based on convolutional spiking neural networks and spiking attention mechanism.

本文引用的文献

1
Critical assessment of methods of protein structure prediction (CASP)-Round XV.蛋白质结构预测方法的关键评估(CASP)-第十五轮。
Proteins. 2023 Dec;91(12):1539-1549. doi: 10.1002/prot.26617. Epub 2023 Nov 2.
2
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
3
NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning.神经肽预测模型 FRL:基于特征表示学习的神经肽识别可解释预测模型。
iPro-CSAF:基于卷积脉冲神经网络和脉冲注意力机制的启动子识别
PeerJ Comput Sci. 2025 Mar 26;11:e2761. doi: 10.7717/peerj-cs.2761. eCollection 2025.
4
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
5
DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models.DNA序列分析全景:对DNA序列分析任务类型、数据库、数据集、词嵌入方法和语言模型的全面综述。
Front Med (Lausanne). 2025 Apr 8;12:1503229. doi: 10.3389/fmed.2025.1503229. eCollection 2025.
6
AUC-PR is a More Informative Metric for Assessing the Biological Relevance of In Silico Cellular Perturbation Prediction Models.AUC-PR是一种用于评估计算机细胞扰动预测模型生物学相关性的更具信息量的指标。
bioRxiv. 2025 Mar 11:2025.03.06.641935. doi: 10.1101/2025.03.06.641935.
7
Transitioning from wet lab to artificial intelligence: a systematic review of AI predictors in CRISPR.从湿实验室到人工智能的转变:对CRISPR中人工智能预测因子的系统综述
J Transl Med. 2025 Feb 4;23(1):153. doi: 10.1186/s12967-024-06013-w.
8
RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models.RNA序列分析全景:任务类型、数据库、数据集、词嵌入方法及语言模型的全面综述
Heliyon. 2025 Jan 6;11(2):e41488. doi: 10.1016/j.heliyon.2024.e41488. eCollection 2025 Jan 30.
9
Deep learning in bioinformatics.生物信息学中的深度学习。
Turk J Biol. 2023 Dec 18;47(6):366-382. doi: 10.55730/1300-0152.2671. eCollection 2023.
10
Incorporating knowledge of disease-defining hub genes and regulatory network into a machine learning-based model for predicting treatment response in lupus nephritis after the first renal flare.将疾病定义枢纽基因和调控网络的知识纳入基于机器学习的模型中,以预测首次肾发作后狼疮肾炎的治疗反应。
J Transl Med. 2023 Feb 3;21(1):76. doi: 10.1186/s12967-023-03931-z.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab167.
4
Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework.Meta-i6mA:利用集成机器学习框架中的信息特征,用于识别植物基因组中 DNA N6-甲基腺嘌呤位点的种间预测因子。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa202.
5
Overall Survival with Ribociclib plus Endocrine Therapy in Breast Cancer.瑞博西利联合内分泌治疗乳腺癌的总生存。
N Engl J Med. 2019 Jul 25;381(4):307-316. doi: 10.1056/NEJMoa1903765. Epub 2019 Jun 4.
6
AlphaFold at CASP13.AlphaFold 在 CASP13 中的应用。
Bioinformatics. 2019 Nov 1;35(22):4862-4865. doi: 10.1093/bioinformatics/btz422.
7
Survivin at a glance.Survivin 一览。
J Cell Sci. 2019 Apr 4;132(7):jcs223826. doi: 10.1242/jcs.223826.
8
HER3 signaling and targeted therapy in cancer.癌症中的HER3信号传导与靶向治疗
Oncol Rev. 2018 May 16;12(1):355. doi: 10.4081/oncol.2018.355. eCollection 2018 Jan 30.
9
Mechanistic models versus machine learning, a fight worth fighting for the biological community?机制模型与机器学习,生物学界值得为之奋斗的一场较量?
Biol Lett. 2018 May;14(5). doi: 10.1098/rsbl.2017.0660.
10
Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: integrated access to diverse large-scale cellular perturbation response data.LINCS 计划的细胞信号网络综合数据库数据门户:对多样化大规模细胞扰动反应数据的综合访问。
Nucleic Acids Res. 2018 Jan 4;46(D1):D558-D566. doi: 10.1093/nar/gkx1063.