Center for Outcomes Research, University of Illinois College of Medicine at Peoria, One Illini Drive, Box 1649, Peoria, IL 61656, USA.
BMC Infect Dis. 2014 Oct 4;14:540. doi: 10.1186/1471-2334-14-540.
In most biological experiments, especially infectious disease, the exposure-response relationship is interrelated by a multitude of factors rather than many independent factors. Little is known about the suitability of ordinary, categorical exposures, and logarithmic transformation which have been presented in logistic regression models to assess the likelihood of an infectious disease as a function of a risk or exposure. This study aims to examine and compare the current approaches.
A simulated human immunodeficiency virus (HIV) population, dynamic infection data for 100,000 individuals with 1% initial prevalence and 2% infectivity, was created. Using the Monte Carlo method (computational algorithm) to repeat random sampling to obtain numerical results, linearity between log odds and exposure, and suitability in practice were examined in the three model approaches.
Despite diverse population prevalence, the linearity was not satisfied between log odds and raw exposures. Logarithmic transformation of exposures improved the linearity to a certain extent, and categorical exposures satisfied the linear assumption (which was important for modelling). When the population prevalence was low (assumed < 10%), performances of the three models were significantly different. Comparing to ordinary logistic regression, the logarithmic transformation approach demonstrated better accuracy of estimation except that at the two inflection points: likelihood of infection increased from slowly to sharply, then slowly again. The approach using categorical exposures had better estimations around the real values, but the measurement was coarse due to categorization.
It is not suitable to directly use ordinary logistic regression to explore the exposure-response relationship of HIV as an infectious disease. This study provides some recommendations for practical implementations including: 1) utilize categorical exposure if a large sample size and low population prevalence are provided; 2) utilize a logarithmic transformed exposure if the sample size is insufficient or the population prevalence is too high (such as 30%).
在大多数生物学实验中,尤其是传染病研究中,暴露-反应关系是由多种因素相互关联的,而不是许多独立因素。对于普通的、分类的暴露以及对数转换在 logistic 回归模型中评估传染病风险或暴露的可能性的适用性,人们知之甚少。本研究旨在检验和比较目前的方法。
创建了一个模拟的人类免疫缺陷病毒(HIV)人群,对 100,000 名初始患病率为 1%、感染率为 2%的个体进行了动态感染数据模拟。使用蒙特卡罗方法(计算算法)进行随机抽样重复,以获得数值结果,检验了三种模型方法中对数几率与暴露之间的线性关系以及在实际应用中的适用性。
尽管人群患病率不同,但对数几率与原始暴露之间并不满足线性关系。暴露的对数转换在一定程度上改善了线性关系,并且分类暴露满足线性假设(这对于建模很重要)。当人群患病率较低(假设<10%)时,三种模型的性能有显著差异。与普通 logistic 回归相比,对数转换方法除了在两个拐点处(感染可能性从缓慢增加到急剧增加,然后再次缓慢增加)外,估计的准确性更高。使用分类暴露的方法在真实值周围的估计值更好,但由于分类,测量值较粗糙。
直接使用普通 logistic 回归来探索 HIV 等传染病的暴露-反应关系是不合适的。本研究为实际应用提供了一些建议,包括:1)如果提供了大样本量和低人群患病率,则使用分类暴露;2)如果样本量不足或人群患病率过高(例如 30%),则使用对数转换的暴露。