School of Statistics, University of Minnesota, Minneapolis, Minnesota, USA.
Department of Biostatistics, University of Washington, Seattle, Washington, USA.
Biometrics. 2023 Dec;79(4):3485-3496. doi: 10.1111/biom.13926. Epub 2023 Oct 5.
In many categorical response regression applications, the response categories admit a multiresolution structure. That is, subsets of the response categories may naturally be combined into coarser response categories. In such applications, practitioners are often interested in estimating the resolution at which a predictor affects the response category probabilities. In this paper, we propose a method for fitting the multinomial logistic regression model in high dimensions that addresses this problem in a unified and data-driven way. Our method allows practitioners to identify which predictors distinguish between coarse categories but not fine categories, which predictors distinguish between fine categories, and which predictors are irrelevant. For model fitting, we propose a scalable algorithm that can be applied when the coarse categories are defined by either overlapping or nonoverlapping sets of fine categories. Statistical properties of our method reveal that it can take advantage of this multiresolution structure in a way existing estimators cannot. We use our method to model cell-type probabilities as a function of a cell's gene expression profile (i.e., cell-type annotation). Our fitted model provides novel biological insights which may be useful for future automated and manual cell-type annotation methodology.
在许多分类响应回归应用中,响应类别具有多分辨率结构。也就是说,响应类别的子集可以自然地组合成更粗糙的响应类别。在这种应用中,从业者通常有兴趣估计预测器影响响应类别概率的分辨率。在本文中,我们提出了一种用于拟合多项逻辑回归模型的方法,该方法以统一和数据驱动的方式解决了这个问题。我们的方法允许从业者识别哪些预测器区分粗类别但不区分细类别,哪些预测器区分细类别,以及哪些预测器是不相关的。对于模型拟合,我们提出了一种可扩展的算法,当粗类别由细类别的重叠或非重叠集合定义时,可以应用该算法。我们方法的统计性质表明,它可以以现有估计器无法做到的方式利用这种多分辨率结构。我们使用我们的方法来模拟细胞类型的概率作为细胞表达谱(即细胞类型注释)的函数。我们拟合的模型提供了新的生物学见解,这可能对未来的自动和手动细胞类型注释方法有用。