Wang Fei, Zhang Ping, Wang Xiang, Hu Jianying
IBM T. J. Watson Research Center, Yorktown Heights, NY.
AMIA Annu Symp Proc. 2014 Nov 14;2014:1170-9. eCollection 2014.
Clinical risk prediction is one important problem in medical informatics, and logistic regression is one of the most widely used approaches for clinical risk prediction. In many cases, the number of potential risk factors is fairly large and the actual set of factors that contribute to the risk is small. Therefore sparse logistic regression is proposed, which can not only predict the clinical risk but also identify the set of relevant risk factors. The inputs of logistic regression and sparse logistic regression are required to be in vector form. This limits the applicability of these models in the problems when the data cannot be naturally represented vectors (e.g., medical images are two-dimensional matrices). To handle the cases when the data are in the form of multi-dimensional arrays, we propose HOSLR: High-Order Sparse Logistic Regression, which can be viewed as a high order extension of sparse logistic regression. Instead of solving one classification vector as in conventional logistic regression, we solve for K classification vectors in HOSLR (K is the number of modes in the data). A block proximal descent approach is proposed to solve the problem and its convergence is guaranteed. Finally we validate the effectiveness of HOSLR on predicting the onset risk of patients with Alzheimer's disease and heart failure.
临床风险预测是医学信息学中的一个重要问题,逻辑回归是临床风险预测中使用最广泛的方法之一。在许多情况下,潜在风险因素的数量相当大,而真正导致风险的因素集却很小。因此,提出了稀疏逻辑回归,它不仅可以预测临床风险,还能识别相关风险因素集。逻辑回归和稀疏逻辑回归的输入要求为向量形式。这限制了这些模型在数据无法自然表示为向量的问题中的适用性(例如,医学图像是二维矩阵)。为了处理数据为多维数组形式的情况,我们提出了HOSLR:高阶稀疏逻辑回归,它可以看作是稀疏逻辑回归的高阶扩展。与传统逻辑回归中求解一个分类向量不同,我们在HOSLR中求解K个分类向量(K是数据中的模式数量)。提出了一种块近端下降方法来解决该问题,并保证了其收敛性。最后,我们验证了HOSLR在预测阿尔茨海默病和心力衰竭患者发病风险方面的有效性。