Tian Yuqi, Li Chun, Tu Shengxin, James Nathan T, Harrell Frank E, Shepherd Bryan E
Department of Biostatistics, Vanderbilt University, California.
Department of Population and Public Health Sciences, University of Southern California.
J Am Stat Assoc. 2024;119(546):864-874. doi: 10.1080/01621459.2024.2315667. Epub 2024 Apr 1.
Detection limits (DLs), where a variable cannot be measured outside of a certain range, are common in research. DLs may vary across study sites or over time. Most approaches to handling DLs in response variables implicitly make strong parametric assumptions on the distribution of data outside DLs. We propose a new approach to deal with multiple DLs based on a widely used ordinal regression model, the cumulative probability model (CPM). The CPM is a rank-based, semiparametric linear transformation model that can handle mixed distributions of continuous and discrete outcome variables. These features are key for analyzing data with DLs because while observations inside DLs are continuous, those outside DLs are censored and generally put into discrete categories. With a single lower DL, CPMs assign values below the DL as having the lowest rank. With multiple DLs, the CPM likelihood can be modified to appropriately distribute probability mass. We demonstrate the use of CPMs with DLs via simulations and a data example. This work is motivated by a study investigating factors associated with HIV viral load 6 months after starting antiretroviral therapy in Latin America; 56% of observations are below lower DLs that vary across study sites and over time.
检测限(DLs),即变量在特定范围之外无法测量的情况,在研究中很常见。检测限可能因研究地点或时间而异。大多数处理响应变量中检测限的方法都隐含地对检测限之外的数据分布做出了很强的参数假设。我们基于广泛使用的有序回归模型——累积概率模型(CPM),提出了一种处理多个检测限的新方法。CPM是一种基于秩的半参数线性变换模型,能够处理连续和离散结果变量的混合分布。这些特性对于分析带有检测限的数据至关重要,因为虽然检测限内的观测值是连续的,但检测限外的观测值是截尾的,通常被归为离散类别。对于单个较低检测限,CPM将低于检测限的值赋予最低秩。对于多个检测限,可以修改CPM似然性以适当地分配概率质量。我们通过模拟和一个数据示例展示了CPM在检测限方面的应用。这项工作的动机来自一项在拉丁美洲开展的研究,该研究调查了开始抗逆转录病毒治疗6个月后与HIV病毒载量相关的因素;56%的观测值低于不同研究地点和不同时间变化的较低检测限。