黑箱学习器的特征相关性显著性检验

Significance Tests of Feature Relevance for a Black-Box Learner.

作者信息

Dai Ben, Shen Xiaotong, Pan Wei

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Jun 30;PP. doi: 10.1109/TNNLS.2022.3185742.

DOI:10.1109/TNNLS.2022.3185742

PMID:35771783

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10915654/

Abstract

An exciting recent development is the uptake of deep neural networks in many scientific fields, where the main objective is outcome prediction with a black-box nature. Significance testing is promising to address the black-box issue and explore novel scientific insights and interpretations of the decision-making process based on a deep learning model. However, testing for a neural network poses a challenge because of its black-box nature and unknown limiting distributions of parameter estimates while existing methods require strong assumptions or excessive computation. In this article, we derive one-split and two-split tests relaxing the assumptions and computational complexity of existing black-box tests and extending to examine the significance of a collection of features of interest in a dataset of possibly a complex type, such as an image. The one-split test estimates and evaluates a black-box model based on estimation and inference subsets through sample splitting and data perturbation. The two-split test further splits the inference subset into two but requires no perturbation. Also, we develop their combined versions by aggregating the p -values based on repeated sample splitting. By deflating the bias-sd-ratio, we establish asymptotic null distributions of the test statistics and the consistency in terms of Type 2 error. Numerically, we demonstrate the utility of the proposed tests on seven simulated examples and six real datasets. Accompanying this article is our python library dnn-inference (https://dnn-inference.readthedocs.io/en/latest/) that implements the proposed tests.

摘要

最近一个令人兴奋的进展是深度神经网络在许多科学领域的应用，其主要目标是以黑箱方式进行结果预测。显著性检验有望解决黑箱问题，并基于深度学习模型探索决策过程的新科学见解和解释。然而，对神经网络进行检验具有挑战性，因为其具有黑箱性质且参数估计的极限分布未知，而现有方法需要很强的假设或过多的计算。在本文中，我们推导了单分割和双分割检验，放宽了现有黑箱检验的假设和计算复杂度，并将其扩展到检验可能是复杂类型（如图像）的数据集中感兴趣的一组特征的显著性。单分割检验通过样本分割和数据扰动，基于估计子集和推理子集来估计和评估黑箱模型。双分割检验进一步将推理子集再分割为两个，但不需要扰动。此外，我们通过基于重复样本分割聚合p值来开发它们的组合版本。通过降低偏差-标准差比率，我们建立了检验统计量的渐近零分布以及在第二类错误方面的一致性。在数值上，我们在七个模拟示例和六个真实数据集上展示了所提出检验的效用。与本文配套的是我们的Python库dnn-inference（https://dnn-inference.readthedocs.io/en/latest/），它实现了所提出的检验。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92e6/10915654/97d6d774a4b3/nihms-1964699-f0001.jpg

相似文献

Significance Tests of Feature Relevance for a Black-Box Learner.黑箱学习器的特征相关性显著性检验

IEEE Trans Neural Netw Learn Syst. 2022 Jun 30;PP. doi: 10.1109/TNNLS.2022.3185742.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers.ABC攻击：一种用于欺骗深度图像分类器的无梯度优化黑盒攻击。

Entropy (Basel). 2022 Mar 15;24(3):412. doi: 10.3390/e24030412.

HOPE: High-Order Polynomial Expansion of Black-Box Neural Networks.HOPE：黑箱神经网络的高阶多项式展开

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):7924-7939. doi: 10.1109/TPAMI.2024.3399197. Epub 2024 Nov 6.

Variational Learning of Individual Survival Distributions.个体生存分布的变分学习

Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:10-18. doi: 10.1145/3368555.3384454. Epub 2020 Apr 2.

MLink: Linking Black-Box Models From Multiple Domains for Collaborative Inference.MLink：跨多领域链接黑盒模型以进行协作推理

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12085-12097. doi: 10.1109/TPAMI.2023.3283780. Epub 2023 Sep 5.

Multi-scale Fisher's independence test for multivariate dependence.用于多变量依赖关系的多尺度费舍尔独立性检验。

Biometrika. 2022 Sep;109(3):569-587. doi: 10.1093/biomet/asac013. Epub 2022 Feb 21.

Deep learning based correction of RF field induced inhomogeneities for T2w prostate imaging at 7 T.基于深度学习的 7T 前列腺 T2w 成像中射频场不均匀性校正。

NMR Biomed. 2023 Dec;36(12):e5019. doi: 10.1002/nbm.5019. Epub 2023 Aug 25.

Hierarchical Bayesian non-response models for error rates in forensic black-box studies.用于法医黑盒研究中错误率的分层贝叶斯无应答模型。

Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220157. doi: 10.1098/rsta.2022.0157. Epub 2023 Mar 27.

Ocean oil spill detection from SAR images based on multi-channel deep learning semantic segmentation.基于多通道深度学习语义分割的合成孔径雷达图像海洋溢油检测

Mar Pollut Bull. 2023 Mar;188:114651. doi: 10.1016/j.marpolbul.2023.114651. Epub 2023 Feb 1.

引用本文的文献

An exploration of testing genetic associations using goodness-of-fit statistics based on deep ReLU neural networks.基于深度ReLU神经网络，使用拟合优度统计量探索基因关联检测。

Front Syst Biol. 2024 Nov 18;4:1460369. doi: 10.3389/fsysb.2024.1460369. eCollection 2024.

Assessing variable importance in survival analysis using machine learning.使用机器学习评估生存分析中的变量重要性。

Biometrika. 2024 Nov 4;112(2):asae061. doi: 10.1093/biomet/asae061. eCollection 2025.

Importance of variables from different time frames for predicting self-harm using health system data.利用卫生系统数据预测自残行为时不同时间框架变量的重要性。

J Biomed Inform. 2024 Dec;160:104750. doi: 10.1016/j.jbi.2024.104750. Epub 2024 Nov 16.

A bootstrap model comparison test for identifying genes with context-specific patterns of genetic regulation.一种用于识别具有基因调控上下文特异性模式的基因的自举模型比较测试。

Ann Appl Stat. 2024 Sep;18(3):1840-1857. doi: 10.1214/23-aoas1859. Epub 2024 Aug 5.

Importance of variables from different time frames for predicting self-harm using health system data.利用卫生系统数据预测自残行为时不同时间框架变量的重要性。

medRxiv. 2024 Sep 20:2024.04.29.24306260. doi: 10.1101/2024.04.29.24306260.

Novel Uncertainty Quantification Through Perturbation-Assisted Sample Synthesis.通过扰动辅助样本合成实现新型不确定性量化

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):7813-7824. doi: 10.1109/TPAMI.2024.3393364. Epub 2024 Nov 6.

A BOOTSTRAP MODEL COMPARISON TEST FOR IDENTIFYING GENES WITH CONTEXT-SPECIFIC PATTERNS OF GENETIC REGULATION.一种用于识别具有基因调控上下文特异性模式基因的自举模型比较测试。

bioRxiv. 2023 Oct 22:2023.03.06.531446. doi: 10.1101/2023.03.06.531446.

本文引用的文献

Universal inference.普遍推断。

Proc Natl Acad Sci U S A. 2020 Jul 21;117(29):16880-16890. doi: 10.1073/pnas.1922664117. Epub 2020 Jul 6.

Discovering the anti-cancer potential of non-oncology drugs by systematic viability profiling.通过系统生存力分析发现非肿瘤药物的抗癌潜力。

Nat Cancer. 2020 Feb;1(2):235-248. doi: 10.1038/s43018-019-0018-6. Epub 2020 Jan 20.

Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning.基于图像的深度学习识别医学诊断和可治疗疾病。

Cell. 2018 Feb 22;172(5):1122-1131.e9. doi: 10.1016/j.cell.2018.02.010.

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.下一代连接图谱：L1000平台及首批100万个图谱

Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049.

Deep learning in neural networks: an overview.神经网络中的深度学习：综述。

Neural Netw. 2015 Jan;61:85-117. doi: 10.1016/j.neunet.2014.09.003. Epub 2014 Oct 13.

HIGH DIMENSIONAL VARIABLE SELECTION.高维变量选择

Ann Stat. 2009 Jan 1;37(5A):2178-2201. doi: 10.1214/08-aos646.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

黑箱学习器的特征相关性显著性检验

Significance Tests of Feature Relevance for a Black-Box Learner.

作者信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献