do Nascimento Rafaella L S, Fagundes Roberta A de A, de Souza Renata M C R, Cysneiros Francisco José A
Centro de Informática, Universidade Federal de Pernambuco, Recife, Brazil.
Departamento de Engenharia da Computação, Universidade de Pernambuco, Recife, Brazil.
Pattern Anal Appl. 2023;26(1):39-59. doi: 10.1007/s10044-022-01093-0. Epub 2022 Jul 18.
Interval-valued data have been commonly encountered in practice, and Symbolic Data Analysis provides a solution to the statistical treatment of these data. Regression analysis for interval-valued symbolic data is a topic that has been widely investigated in the literature of symbolic data analysis, and several models from different paradigms have been proposed. There are basic regression assumptions, and it is essential to validate them. This paper introduces an approach to check interval regression model adequacy based on residual analysis. Concepts of ordinary and standardized interval residual are presented, and graphical analysis of these residuals is also proposed. To show the usefulness of the proposed approach, an application for estimating school dropout in the scenario of Brazilian municipalities is performed. We observed some outliers from the interval residuals analysis, and interval robust regression models are more suitable for estimating school dropout.
区间值数据在实际中经常遇到,符号数据分析为这些数据的统计处理提供了一种解决方案。区间值符号数据的回归分析是符号数据分析文献中广泛研究的一个主题,已经提出了几种不同范式的模型。存在基本的回归假设,对其进行验证至关重要。本文介绍了一种基于残差分析来检验区间回归模型适用性的方法。提出了普通区间残差和标准化区间残差的概念,并对这些残差进行了图形分析。为了说明所提方法的实用性,在巴西各市的场景下进行了估计辍学率的应用。我们从区间残差分析中观察到一些异常值,区间稳健回归模型更适合于估计辍学率。