Department of Physiological Nursing, University of California San Francisco, San Francisco, CA, USA.
Keck Graduate Institute, Claremont, CA, USA.
Biol Res Nurs. 2023 Jul;25(3):393-403. doi: 10.1177/10998004221147513. Epub 2023 Jan 4.
Accurate prediction of risk for chronic diseases like type 2 diabetes (T2D) is challenging due to the complex underlying etiology. Integration of more complex data types from sensors and leveraging technologies for collection of -omics datasets may provide greater insights into the specific risk profile for complex diseases. We performed a literature review to identify feature selection methods and machine learning models for prediction of weight loss in a previously completed clinical trial (NCT02278939) of a behavioral intervention for weight loss in Filipinos at risk for T2D. Features included demographic and clinical characteristics, dietary factors, physical activity, and transcriptomics. We identified four feature selection methods: Correlation-based Feature Subset Selection (CfsSubsetEval) with BestFirst, Kolmogorov-Smirnov (KS) test with correlation featureselection (CFS), DESeq2, and max-relevance-min-relevance (MRMR) with linear forward search and mutual information (MI) and four machine learning algorithms: support vector machine, decision tree, random forest, and extra trees that are applicable to prediction of weight loss using the specified feature types. More accurate prediction of risk for T2D and other complex conditions may be possible by leveraging complex data types from sensors and -omics datasets. Emerging methods for feature selection and machine learning algorithms make this type of modeling feasible.
由于复杂的潜在病因,准确预测 2 型糖尿病(T2D)等慢性疾病的风险具有挑战性。整合来自传感器的更复杂数据类型,并利用技术来收集组学数据集,可能会更深入地了解复杂疾病的特定风险概况。我们进行了文献回顾,以确定特征选择方法和机器学习模型,用于预测先前完成的一项针对菲律宾 T2D 风险人群的减肥行为干预临床试验(NCT02278939)中的体重减轻情况。特征包括人口统计学和临床特征、饮食因素、身体活动和转录组学。我们确定了四种特征选择方法:基于相关性的特征子集选择(CfsSubsetEval)与最佳优先(BestFirst)、柯尔莫哥洛夫-斯米尔诺夫(KS)测试与相关性特征选择(CFS)、DESeq2 和最大相关性-最小相关性(MRMR)与线性前向搜索和互信息(MI),以及四种适用于使用指定特征类型预测体重减轻的机器学习算法:支持向量机、决策树、随机森林和极端树。通过利用来自传感器和组学数据集的复杂数据类型,可能更准确地预测 T2D 和其他复杂疾病的风险。新兴的特征选择方法和机器学习算法使这种类型的建模成为可能。