Suppr超能文献

黑箱学习器的特征相关性显著性检验

Significance Tests of Feature Relevance for a Black-Box Learner.

作者信息

Dai Ben, Shen Xiaotong, Pan Wei

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Jun 30;PP. doi: 10.1109/TNNLS.2022.3185742.

Abstract

An exciting recent development is the uptake of deep neural networks in many scientific fields, where the main objective is outcome prediction with a black-box nature. Significance testing is promising to address the black-box issue and explore novel scientific insights and interpretations of the decision-making process based on a deep learning model. However, testing for a neural network poses a challenge because of its black-box nature and unknown limiting distributions of parameter estimates while existing methods require strong assumptions or excessive computation. In this article, we derive one-split and two-split tests relaxing the assumptions and computational complexity of existing black-box tests and extending to examine the significance of a collection of features of interest in a dataset of possibly a complex type, such as an image. The one-split test estimates and evaluates a black-box model based on estimation and inference subsets through sample splitting and data perturbation. The two-split test further splits the inference subset into two but requires no perturbation. Also, we develop their combined versions by aggregating the p -values based on repeated sample splitting. By deflating the bias-sd-ratio, we establish asymptotic null distributions of the test statistics and the consistency in terms of Type 2 error. Numerically, we demonstrate the utility of the proposed tests on seven simulated examples and six real datasets. Accompanying this article is our python library dnn-inference (https://dnn-inference.readthedocs.io/en/latest/) that implements the proposed tests.

摘要

最近一个令人兴奋的进展是深度神经网络在许多科学领域的应用,其主要目标是以黑箱方式进行结果预测。显著性检验有望解决黑箱问题,并基于深度学习模型探索决策过程的新科学见解和解释。然而,对神经网络进行检验具有挑战性,因为其具有黑箱性质且参数估计的极限分布未知,而现有方法需要很强的假设或过多的计算。在本文中,我们推导了单分割和双分割检验,放宽了现有黑箱检验的假设和计算复杂度,并将其扩展到检验可能是复杂类型(如图像)的数据集中感兴趣的一组特征的显著性。单分割检验通过样本分割和数据扰动,基于估计子集和推理子集来估计和评估黑箱模型。双分割检验进一步将推理子集再分割为两个,但不需要扰动。此外,我们通过基于重复样本分割聚合p值来开发它们的组合版本。通过降低偏差-标准差比率,我们建立了检验统计量的渐近零分布以及在第二类错误方面的一致性。在数值上,我们在七个模拟示例和六个真实数据集上展示了所提出检验的效用。与本文配套的是我们的Python库dnn-inference(https://dnn-inference.readthedocs.io/en/latest/),它实现了所提出的检验。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92e6/10915654/97d6d774a4b3/nihms-1964699-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验