Bohlin Jon, Håberg Siri E, Magnus Per, Gjessing Håkon K
Department of Method Development and Analytics, Section for modeling and bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.
Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway.
BMC Bioinformatics. 2024 Dec 18;25(1):380. doi: 10.1186/s12859-024-06000-4.
Generating prediction models from high dimensional data often result in large models with many predictors. Causal inference for such models can therefore be difficult or even impossible in practice. The stand-alone software package MinLinMo emphasizes small linear prediction models over highest possible predictability with a particular focus on including variables correlated with the outcome, minimal memory usage and speed. MinLinMo is demonstrated on large epigenetic datasets with prediction models for chronological age, gestational age, and birth weight comprising, respectively, 15, 14 and 10 predictors. The parsimonious MinLinMo models perform comparably to established prediction models requiring hundreds of predictors.
从高维数据生成预测模型通常会得到包含许多预测变量的大型模型。因此,对此类模型进行因果推断在实际中可能很困难甚至无法实现。独立软件包MinLinMo强调小型线性预测模型,而非追求尽可能高的可预测性,特别注重纳入与结果相关的变量、最小化内存使用和速度。在大型表观遗传数据集上展示了MinLinMo,其针对 chronological age(实足年龄)、gestational age(胎龄)和 birth weight(出生体重)的预测模型分别包含15、14和10个预测变量。简约的MinLinMo模型与需要数百个预测变量的既定预测模型表现相当。