• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种解释随机森林模型的新方法及其在衰老生物学中的应用。

A new approach for interpreting Random Forest models and its application to the biology of ageing.

机构信息

School of Computing, University of Kent, Canterbury, Kent, UK.

Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK.

出版信息

Bioinformatics. 2018 Jul 15;34(14):2449-2456. doi: 10.1093/bioinformatics/bty087.

DOI:10.1093/bioinformatics/bty087
PMID:29462247
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6041990/
Abstract

MOTIVATION

This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model.

RESULTS

The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure.

AVAILABILITY AND IMPLEMENTATION

The dataset and source codes used in this paper are available as 'Supplementary Material' and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

本研究采用随机森林 (RF) 分类算法来预测基因在大脑中随年龄的表达是上调、下调还是无变化。RF 具有较高的预测能力,并且可以使用特征(变量)重要性度量来解释 RF 模型。然而,目前的特征重要性度量方法将特征作为一个整体(所有特征值)进行评估。我们发现,对于一种常见类型的生物数据(基于基因本体论),通常只有一个特征值对于分类和 RF 模型的解释非常重要。因此,我们提出了一种新的算法来识别 RF 模型中最重要和最具信息量的特征值。

结果

新的特征重要性度量方法确定了上述基因分类任务中高度相关的基因本体论术语,生成的特征排序比替代的、最先进的特征重要性度量方法更能为生物学家提供信息。

可用性和实现

本文使用的数据集和源代码可作为“补充材料”获得,有关数据的说明可在以下网址找到:https://fabiofabris.github.io/bioinfo2018/web/。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcd4/6041990/071f9614c034/bty087f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcd4/6041990/071f9614c034/bty087f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcd4/6041990/071f9614c034/bty087f1.jpg

相似文献

1
A new approach for interpreting Random Forest models and its application to the biology of ageing.一种解释随机森林模型的新方法及其在衰老生物学中的应用。
Bioinformatics. 2018 Jul 15;34(14):2449-2456. doi: 10.1093/bioinformatics/bty087.
2
Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations.Onto2Vec:基于向量的生物实体联合表示及其基于本体论的标注。
Bioinformatics. 2018 Jul 1;34(13):i52-i60. doi: 10.1093/bioinformatics/bty259.
3
Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.Hum-mPLoc 3.0:通过对基因本体和功能域特征的隐藏相关性进行建模来增强人类蛋白质亚细胞定位预测
Bioinformatics. 2017 Mar 15;33(6):843-853. doi: 10.1093/bioinformatics/btw723.
4
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
5
Permutation importance: a corrected feature importance measure.排列重要性:一种修正的特征重要性度量。
Bioinformatics. 2010 May 15;26(10):1340-7. doi: 10.1093/bioinformatics/btq134. Epub 2010 Apr 12.
6
Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.基于差分隐私的 Relief-F 和随机森林蒸发冷却特征选择与分类。
Bioinformatics. 2017 Sep 15;33(18):2906-2913. doi: 10.1093/bioinformatics/btx298.
7
Using deep learning to associate human genes with age-related diseases.利用深度学习将人类基因与年龄相关疾病联系起来。
Bioinformatics. 2020 Apr 1;36(7):2202-2208. doi: 10.1093/bioinformatics/btz887.
8
The revival of the Gini importance?基尼重要性的复兴?
Bioinformatics. 2018 Nov 1;34(21):3711-3718. doi: 10.1093/bioinformatics/bty373.
9
New KEGG pathway-based interpretable features for classifying ageing-related mouse proteins.基于新 KEGG 通路的可解释特征用于分类与衰老相关的小鼠蛋白。
Bioinformatics. 2016 Oct 1;32(19):2988-95. doi: 10.1093/bioinformatics/btw363. Epub 2016 Jun 17.
10
WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest.WDL-RF:通过结合加权深度学习和随机森林预测与 G 蛋白偶联受体相互作用的配体分子的生物活性。
Bioinformatics. 2018 Jul 1;34(13):2271-2282. doi: 10.1093/bioinformatics/bty070.

引用本文的文献

1
Inhibition of CYP450 family 1 subfamily B member 1 (CYP1B1) expression in macrophage reduces the inflammatory response in type 2 diabetes mellitus combined with tuberculosis.抑制巨噬细胞中细胞色素P450 1B1(CYP1B1)的表达可降低2型糖尿病合并肺结核患者的炎症反应。
Front Endocrinol (Lausanne). 2025 Aug 21;16:1617292. doi: 10.3389/fendo.2025.1617292. eCollection 2025.
2
Pushing the boundaries of few-shot learning for low-data drug discovery with a Bayesian meta-learning hypernetwork framework.利用贝叶斯元学习超网络框架拓展少样本学习在低数据药物发现中的边界。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf408.
3

本文引用的文献

1
Machine learning for predicting lifespan-extending chemical compounds.用于预测延长寿命的化合物的机器学习。
Aging (Albany NY). 2017 Jul 18;9(7):1721-1737. doi: 10.18632/aging.101264.
2
Intervention in prediction measure: a new approach to assessing variable importance for random forests.预测度量中的干预:一种评估随机森林变量重要性的新方法。
BMC Bioinformatics. 2017 May 2;18(1):230. doi: 10.1186/s12859-017-1650-8.
3
A review of supervised machine learning applied to ageing research.对应用于衰老研究的监督式机器学习的综述。
Epigenetic ageing clocks: statistical methods and emerging computational challenges.
表观遗传衰老时钟:统计方法与新出现的计算挑战
Nat Rev Genet. 2025 May;26(5):350-368. doi: 10.1038/s41576-024-00807-w. Epub 2025 Jan 13.
4
Identification of potential biomarkers from amino acid transporter in the activation of hepatic stellate cells via bioinformatics.通过生物信息学从氨基酸转运蛋白中鉴定肝星状细胞激活过程中的潜在生物标志物。
Front Genet. 2024 Dec 4;15:1499915. doi: 10.3389/fgene.2024.1499915. eCollection 2024.
5
Machine learning approaches for biomolecular, biophysical, and biomaterials research.用于生物分子、生物物理和生物材料研究的机器学习方法。
Biophys Rev (Melville). 2022 Jun 3;3(2):021306. doi: 10.1063/5.0082179. eCollection 2022 Jun.
6
Identification of key biomarkers for predicting CAD progression in inflammatory bowel disease via machine-learning and bioinformatics strategies.通过机器学习和生物信息学策略识别预测炎症性肠病中 CAD 进展的关键生物标志物。
J Cell Mol Med. 2024 Mar;28(6):e18175. doi: 10.1111/jcmm.18175.
7
A machine learning approach to differentiate wide QRS tachycardia: distinguishing ventricular tachycardia from supraventricular tachycardia.机器学习在宽 QRS 心动过速鉴别诊断中的应用:鉴别室性心动过速与室上性心动过速。
J Interv Card Electrophysiol. 2024 Sep;67(6):1391-1398. doi: 10.1007/s10840-024-01743-9. Epub 2024 Jan 22.
8
Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study.用于单纯疱疹病毒患者登记中风险组识别和用户数据收集的机器学习:算法开发与验证研究
JMIRx Med. 2021 Jun 11;2(2):e25560. doi: 10.2196/25560.
9
Cytometric analysis reveals an association between allergen-responsive natural killer cells and human peanut allergy.流式细胞术分析揭示变应原反应性自然杀伤细胞与人类花生过敏之间的关联。
J Clin Invest. 2022 Oct 17;132(20):e157962. doi: 10.1172/JCI157962.
10
Identification and Validation of Immune Markers in Coronary Heart Disease.冠心病免疫标志物的鉴定与验证。
Comput Math Methods Med. 2022 Aug 26;2022:2877679. doi: 10.1155/2022/2877679. eCollection 2022.
Biogerontology. 2017 Apr;18(2):171-188. doi: 10.1007/s10522-017-9683-y. Epub 2017 Mar 6.
4
Defining an olfactory receptor function in airway smooth muscle cells.定义气道平滑肌细胞中的嗅觉受体功能。
Sci Rep. 2016 Dec 1;6:38231. doi: 10.1038/srep38231.
5
An Update on Inflamm-Aging: Mechanisms, Prevention, and Treatment.炎症性衰老:机制、预防与治疗的最新进展
J Immunol Res. 2016;2016:8426874. doi: 10.1155/2016/8426874. Epub 2016 Jul 14.
6
Expression of human olfactory receptor 10J5 in heart aorta, coronary artery, and endothelial cells and its functional role in angiogenesis.人类嗅觉受体10J5在心脏、主动脉、冠状动脉及内皮细胞中的表达及其在血管生成中的功能作用。
Biochem Biophys Res Commun. 2015 May 1;460(2):404-8. doi: 10.1016/j.bbrc.2015.03.046. Epub 2015 Mar 17.
7
Unfolded protein response.未折叠蛋白反应
Curr Biol. 2012 Aug 21;22(16):R622-6. doi: 10.1016/j.cub.2012.07.004.
8
Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?生命科学中的随机森林数据挖掘:是漫步公园还是迷失丛林?
Brief Bioinform. 2013 May;14(3):315-26. doi: 10.1093/bib/bbs034. Epub 2012 Jul 10.
9
Cost of disorders of the brain in Europe 2010.2010 年欧洲大脑疾病负担成本。
Eur Neuropsychopharmacol. 2011 Oct;21(10):718-79. doi: 10.1016/j.euroneuro.2011.08.008. Epub 2011 Sep 15.
10
Insights into intermediate filament regulation from development to ageing.从中胚层调控到衰老的见解。
J Cell Sci. 2011 May 1;124(Pt 9):1363-72. doi: 10.1242/jcs.041244.