Wang Zirui, Ling Wodan, Wang Tianying
Department of Statistics and Data Science, Tsinghua University, Beijing, 100084, China.
Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
Biometrics. 2025 Apr 2;81(2). doi: 10.1093/biomtc/ujaf050.
Zero-inflated data commonly arise in various fields, including economics, healthcare, and environmental sciences, where measurements frequently include an excess of zeros due to structural or sampling mechanisms. Traditional approaches, such as Zero-Inflated Poisson and Zero-Inflated Negative Binomial models, have been widely used to handle excess zeros in count data, but they rely on strong parametric assumptions that may not hold in complex real-world applications. In this paper, we propose a zero-inflated quantile single-index rank-score-based test (ZIQ-SIR) to detect associations between zero-inflated outcomes and covariates, particularly when nonlinear relationships are present. ZIQ-SIR offers a flexible, semi-parametric approach that accounts for the zero-inflated nature of the data and avoids the restrictive assumptions of traditional parametric models. Through simulations, we show that ZIQ-SIR outperforms existing methods by achieving higher power and better Type I error control, owing to its flexibility in modeling zero-inflated and overdispersed data. We apply our method to the real-world dataset: microbiome abundance from the Columbian Gut study. In this application, ZIQ-SIR identifies more significant associations than alternative approaches, while maintaining accurate type I error control.
零膨胀数据常见于包括经济学、医疗保健和环境科学在内的各个领域,在这些领域中,由于结构或抽样机制,测量结果往往包含过多的零值。传统方法,如零膨胀泊松模型和零膨胀负二项式模型,已被广泛用于处理计数数据中的过多零值,但它们依赖于在复杂的实际应用中可能不成立的强参数假设。在本文中,我们提出了一种基于零膨胀分位数单指标秩得分的检验方法(ZIQ-SIR),用于检测零膨胀结果与协变量之间的关联,特别是在存在非线性关系的情况下。ZIQ-SIR提供了一种灵活的半参数方法,该方法考虑了数据的零膨胀性质,并避免了传统参数模型的严格假设。通过模拟,我们表明ZIQ-SIR由于其在对零膨胀和过度分散数据建模方面的灵活性,在实现更高的检验功效和更好的第一类错误控制方面优于现有方法。我们将我们的方法应用于真实世界数据集:来自哥伦比亚肠道研究的微生物组丰度数据。在这个应用中,ZIQ-SIR比其他方法识别出更多显著的关联,同时保持准确的第一类错误控制。