Avalos Marta, Pouyes Hélène, Grandvalet Yves, Orriols Ludivine, Lagarde Emmanuel
BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S1. doi: 10.1186/1471-2105-16-S6-S1. Epub 2015 Apr 17.
This paper considers the problem of estimation and variable selection for large high-dimensional data (high number of predictors p and large sample size N, without excluding the possibility that N < p) resulting from an individually matched case-control study. We develop a simple algorithm for the adaptation of the Lasso and related methods to the conditional logistic regression model. Our proposal relies on the simplification of the calculations involved in the likelihood function. Then, the proposed algorithm iteratively solves reweighted Lasso problems using cyclical coordinate descent, computed along a regularization path. This method can handle large problems and deal with sparse features efficiently. We discuss benefits and drawbacks with respect to the existing available implementations. We also illustrate the interest and use of these techniques on a pharmacoepidemiological study of medication use and traffic safety.
本文考虑了个体匹配病例对照研究产生的大型高维数据(预测变量数量p多且样本量N大,不排除N < p的可能性)的估计和变量选择问题。我们开发了一种简单算法,用于使套索(Lasso)及相关方法适用于条件逻辑回归模型。我们的提议依赖于似然函数中所涉及计算的简化。然后,所提出的算法使用循环坐标下降法沿着正则化路径迭代求解加权套索问题。该方法能够处理大型问题并有效处理稀疏特征。我们讨论了相对于现有可用实现方式的优缺点。我们还通过一项关于药物使用与交通安全的药物流行病学研究说明了这些技术的意义和用途。