• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于组学数据正则化分类的网络约束森林

Network-constrained forest for regularized classification of omics data.

作者信息

Anděl Michael, Kléma Jiří, Krejčík Zdeněk

机构信息

Department of Computer Science, Czech Technical University, Technická 2, Prague, Czech Republic.

Department of Molecular Genetics, Institute of Hematology and Blood Transfusion, U Nemocnice 1, Prague, Czech Republic.

出版信息

Methods. 2015 Jul 15;83:88-97. doi: 10.1016/j.ymeth.2015.04.006. Epub 2015 Apr 11.

DOI:10.1016/j.ymeth.2015.04.006
PMID:25872185
Abstract

Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.

摘要

当代分子生物学涉及广泛且多样的测量数据集,用于对包括复杂疾病在内的潜在生物学过程进行建模和理解。机器学习提供了一种常用的方法来构建此类模型。然而,仅基于测量数据构建的模型往往会受到过拟合的影响,因为样本大小通常远小于测量特征的数量。在本文中,我们提出了一种基于随机森林的分类器,该分类器借助以特征交互网络形式存在的先验知识来减少这种过拟合。我们在基于测量的mRNA和miRNA谱进行疾病分类的任务中说明了所提出的方法,并辅以由miRNA - mRNA靶标关系以及与其编码蛋白质之间相互作用相对应的mRNA - mRNA相互作用组成的交互网络。我们证明,所提出的网络约束森林利用先验知识来增加学习偏差,从而提高所得模型的分类准确性、稳定性和可理解性。实验是在我们长期关注的骨髓增生异常综合征领域进行的。我们在具有相同数据形式的卵巢癌公共领域验证了我们的方法。我们相信,网络约束森林的概念可以直接推广到具有可用且非平凡特征交互网络的任意组学数据。所提出的方法可通过miXGENE系统(http://mixgene.felk.cvut.cz)公开获取,实现骨髓增生异常综合征实验的工作流程作为一个专门的案例研究呈现。

相似文献

1
Network-constrained forest for regularized classification of omics data.用于组学数据正则化分类的网络约束森林
Methods. 2015 Jul 15;83:88-97. doi: 10.1016/j.ymeth.2015.04.006. Epub 2015 Apr 11.
2
Cancer survival classification using integrated data sets and intermediate information.基于整合数据集和中间信息的癌症生存分类。
Artif Intell Med. 2014 Sep;62(1):23-31. doi: 10.1016/j.artmed.2014.06.003. Epub 2014 Jun 21.
3
Prediction of protein-RNA binding sites by a random forest method with combined features.基于组合特征的随机森林方法预测蛋白质-RNA 结合位点。
Bioinformatics. 2010 Jul 1;26(13):1616-22. doi: 10.1093/bioinformatics/btq253. Epub 2010 May 18.
4
Prediction of microRNA-regulated protein interaction pathways in Arabidopsis using machine learning algorithms.利用机器学习算法预测拟南芥中 miRNA 调控的蛋白质互作通路。
Comput Biol Med. 2013 Nov;43(11):1645-52. doi: 10.1016/j.compbiomed.2013.08.010. Epub 2013 Aug 22.
5
Gene expression complex networks: synthesis, identification, and analysis.基因表达复杂网络:合成、识别与分析。
J Comput Biol. 2011 Oct;18(10):1353-67. doi: 10.1089/cmb.2010.0118. Epub 2011 May 6.
6
TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples.TargetMiner:通过系统识别组织特异性负例进行 microRNA 靶标预测。
Bioinformatics. 2009 Oct 15;25(20):2625-31. doi: 10.1093/bioinformatics/btp503. Epub 2009 Aug 19.
7
Naïve Bayes classifier predicts functional microRNA target interactions in colorectal cancer.朴素贝叶斯分类器预测结直肠癌中功能性微小RNA靶点相互作用。
Mol Biosyst. 2015 Aug;11(8):2126-34. doi: 10.1039/c5mb00245a.
8
Identifying mammalian MicroRNA targets based on supervised distance metric learning.基于监督距离度量学习的哺乳动物 MicroRNA 靶标识别。
IEEE J Biomed Health Inform. 2013 Mar;17(2):427-35. doi: 10.1109/TITB.2012.2229286. Epub 2012 Nov 21.
9
Ensemble learning can significantly improve human microRNA target prediction.集成学习可以显著提高人类微小RNA靶标的预测能力。
Methods. 2014 Oct 1;69(3):220-9. doi: 10.1016/j.ymeth.2014.07.008. Epub 2014 Aug 1.
10
A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。
Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.

引用本文的文献

1
Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping.用于可解释无监督树集成的特征图:中心性、相互作用及其在疾病亚型分类中的应用
BioData Min. 2025 Feb 15;18(1):15. doi: 10.1186/s13040-025-00430-3.
2
Integrating biological knowledge and gene expression data using pathway-guided random forests: a benchmarking study.使用通路引导的随机森林整合生物学知识和基因表达数据:一项基准研究
Bioinformatics. 2020 Aug 1;36(15):4301-4308. doi: 10.1093/bioinformatics/btaa483.
3
A hybrid model for EEG-based gender recognition.
一种基于脑电图的性别识别混合模型。
Cogn Neurodyn. 2019 Dec;13(6):541-554. doi: 10.1007/s11571-019-09543-y. Epub 2019 Jul 4.
4
Integrated microRNA and mRNA Signature Associated with the Transition from the Locally Confined to the Metastasized Clear Cell Renal Cell Carcinoma Exemplified by miR-146-5p.以miR-146-5p为例的与局限性透明细胞肾细胞癌向转移性透明细胞肾细胞癌转变相关的整合微小RNA和信使核糖核酸特征
PLoS One. 2016 Feb 9;11(2):e0148746. doi: 10.1371/journal.pone.0148746. eCollection 2016.
5
Novel gene sets improve set-level classification of prokaryotic gene expression data.新型基因集改进了原核生物基因表达数据的集水平分类。
BMC Bioinformatics. 2015 Oct 28;16:348. doi: 10.1186/s12859-015-0786-7.