Chou Jingyuan, Flory James, Wang Fei
Department of Healthcare Policy and Research. Weill Cornell Medicine.
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:295-304. eCollection 2019.
With the rapid development of computer hardware and software technologies, more and more electronic health data from insurance claims, clinical trials and hospitals are becoming readily available. These data provide a rich resource for developing various healthcare analytics algorithms, among which predictive modeling is of key importance in many real health problems. One important issue for data-driven predictive modeling is high dimensionality, and feature selection is one effective strategy to reduce the number of independent variables and control the confounding factors. However, most of the existing studies just pick one feature selection approach without comprehensive investigations. In this paper, we investigate the issue of drug response heterogeneity for type II diabetes mellitus (T2DM) patients using a large scale clinical trial data. Our goal is to find out the important factors that may lead to the response heterogeneity for three popular T2DM drugs, Metformin, Rosiglitazone and Glimepiride. We implemented 8 different feature selection approaches and compared their performances with various measures including prediction error and the consistency of the identified important factors. Finally, we ensemble all factor lists picked by different algorithms and obtain a final set of factors that contribute to the drug response heterogeneities and verified them through existing literature.
随着计算机硬件和软件技术的快速发展,越来越多来自保险理赔、临床试验和医院的电子健康数据变得易于获取。这些数据为开发各种医疗分析算法提供了丰富的资源,其中预测建模在许多实际健康问题中至关重要。数据驱动的预测建模的一个重要问题是高维度,而特征选择是减少自变量数量和控制混杂因素的一种有效策略。然而,大多数现有研究只采用一种特征选择方法,没有进行全面的调查。在本文中,我们使用大规模临床试验数据研究2型糖尿病(T2DM)患者的药物反应异质性问题。我们的目标是找出可能导致三种常用T2DM药物(二甲双胍、罗格列酮和格列美脲)反应异质性的重要因素。我们实施了8种不同的特征选择方法,并通过包括预测误差和所识别重要因素的一致性等各种指标比较了它们的性能。最后,我们整合了不同算法挑选出的所有因素列表,得到了一组导致药物反应异质性的最终因素,并通过现有文献对其进行了验证。