Department of Statistical Science, Duke University, Durham, NC 27708, USA.
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Bioinformatics. 2022 Aug 10;38(16):4011-4018. doi: 10.1093/bioinformatics/btac431.
It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank.
ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications.
ODIN has been implemented in both Python and R and these implementations along with other code are publicly available at github.com/pritamdey/ODIN-python and github.com/pritamdey/ODIN-r, respectively.
Supplementary data are available at Bioinformatics online.
在神经科学研究中,使用神经影像学测量不同个体的大脑网络已成为常规。这些网络通常表示为邻接矩阵,每个单元格包含一对大脑区域之间连接的摘要。有一个新兴的统计文献描述了分析这种多网络数据的方法,其中节点在网络之间是共同的,但边缘不同。然而,基本上没有考虑到异常值检测的重要问题。特别是对于某些受试者,神经影像学数据质量很差,以至于无法可靠地重建网络。对于这些受试者,得到的邻接矩阵可能大部分为零,或者表现出与功能大脑不一致的奇异模式。这些异常网络可能作为有影响力的点,污染后续的统计分析。我们提出了一种简单的网络异常值检测(ODIN)方法,该方法依赖于邻接矩阵的层次广义线性模型下的影响度量。描述了一种有效的计算算法,并通过模拟和对英国生物库数据的应用来说明 ODIN。
ODIN 成功地识别了中度到极端的异常值。去除这些异常值可以显著改变下游应用的推断。
ODIN 已在 Python 和 R 中实现,这些实现以及其他代码分别在 github.com/pritamdey/ODIN-python 和 github.com/pritamdey/ODIN-r 上公开可用。
补充数据可在 Bioinformatics 在线获得。