Saharan Seema Singh, Nagar Pankaj, Creasy Kate Townsend, Stock Eveline O, Feng James, Malloy Mary J, Kane John P
Department of Clinical Pharmacy, University of California, San Francisco, USA, UCSF Kane Lab, San Francisco, USA, UC Berkeley Extension, Berkeley, USA.
Department of Statistics, University of Rajasthan, Jaipur, India.
Proc (Int Conf Comput Sci Comput Intell). 2023 Dec;2023:652-660. doi: 10.1109/csci62032.2023.00114. Epub 2024 Jul 19.
Coronary artery disease (CAD) is a leading cause of mortality in the world. It is important to be able to proactively assess the risk of the disease, using novel biomarkers like cytokines that are indicators of inflammation in addition to traditional predictors of risk. Atherosclerosis, the primary cause of CAD, is an inflammatory disease involving cytokines. Identifying which cytokines are specifically altered can advance diagnosis and personalized treatment. Emerging research demonstrates that cytokines are transported on high density lipoproteins (HDL). Therefore, it is important to explore the roles of HDL-associated cytokines in vascular inflammation. Machine Learning (ML) algorithms are enhancing pioneering research from the standpoint of precision medicine. This technology can materially enable the translation of scientific research to clinical practice. In this study we implemented logistic regression and the derived regularized techniques using age and multidimensional cytokine biomarkers with the objective of identification of individuals "At Risk" for CAD. These techniques were further empowered by k-fold cross validation and hyper parameter tuning. Of the numerous algorithms investigated, the three most prominent ones, assessed based on area under receiver operating characteristic (AUROC) score are as follows: logistic regression, least absolute shrinkage, and selection operator (LASSO) regression with feature selection and ridge regression with feature selection. Logistic regression demonstrated an AUROC score of .85 with a 95% Confidence Interval CI (.804, .897), LASSO regression achieved a better AUROC score of .875 with a 95% CI (.832, .917) and finally ridge regression with feature selection exhibited the highest AUROC score of .878 with a 95% CI (.837, .92). The 2-sample independent t test proved that the three techniques were statistically significantly different from each other. With regard to the best classification demonstrated by ridge regression with feature selection, the most prominent biomarkers identified for the best classification achieved by ridge regression by feature selection, in the order of importance are as follows: Age, IL-7, RANTES, IFN-gamma, IL-3, GM-CSF, IL-15, IP-10, GCSF, IL-12. The identification and quantification of cytokines transported by HDL provide novel mechanistic insights that can inform the assessment of risk and therapeutic intervention in CAD.
冠状动脉疾病(CAD)是全球主要的死亡原因之一。除了传统的风险预测指标外,能够使用细胞因子等新型生物标志物来主动评估疾病风险非常重要,这些细胞因子是炎症的指标。动脉粥样硬化是CAD的主要病因,是一种涉及细胞因子的炎症性疾病。确定哪些细胞因子发生了特异性改变可以推动诊断和个性化治疗。新兴研究表明,细胞因子通过高密度脂蛋白(HDL)运输。因此,探索HDL相关细胞因子在血管炎症中的作用很重要。机器学习(ML)算法从精准医学的角度加强了开创性研究。这项技术能够切实推动科学研究向临床实践的转化。在本研究中,我们使用年龄和多维细胞因子生物标志物实施了逻辑回归和派生的正则化技术,目的是识别CAD的“高危”个体。这些技术通过k折交叉验证和超参数调整得到了进一步强化。在研究的众多算法中,根据受试者操作特征曲线下面积(AUROC)得分评估,最突出的三种算法如下:逻辑回归、最小绝对收缩和选择算子(LASSO)回归以及带有特征选择的岭回归。逻辑回归的AUROC得分为0.85,95%置信区间CI为(0.804,0.897),LASSO回归的AUROC得分更高,为0.875,95%CI为(0.832,0.917),最后,带有特征选择的岭回归表现出最高的AUROC得分0.878,95%CI为(0.837,0.92)。双样本独立t检验证明这三种技术在统计学上彼此有显著差异。关于带有特征选择的岭回归所展示的最佳分类,通过带有特征选择的岭回归实现最佳分类所识别出的最突出生物标志物,按重要性排序如下:年龄、白细胞介素-7、调节激活正常T细胞表达和分泌的趋化因子(RANTES)、γ干扰素、白细胞介素-3、粒细胞-巨噬细胞集落刺激因子(GM-CSF)、白细胞介素-15、干扰素诱导蛋白10(IP-10)、粒细胞集落刺激因子(GCSF)、白细胞介素-12。对HDL运输的细胞因子进行识别和定量提供了新的机制见解,可为CAD的风险评估和治疗干预提供参考。