Department of Statistics, TU Dortmund University, Dortmund, Germany.
Biom J. 2022 Jun;64(5):883-897. doi: 10.1002/bimj.202000250. Epub 2022 Feb 20.
We extend the scope of application for MCP-Mod (Multiple Comparison Procedure and Modeling) to in vitro gene expression data and assess its characteristics regarding model selection for concentration gene expression curves. Precisely, we apply MCP-Mod on single genes of a high-dimensional gene expression data set, where human embryonic stem cells were exposed to eight concentration levels of the compound valproic acid (VPA). As candidate models we consider the sigmoid (four-parameter log-logistic), linear, quadratic, , exponential, and beta model. Through simulations we investigate the impact of omitting one or more models from the candidate model set to uncover possibly superfluous models and to evaluate the precision and recall rates of selected models. Each model is selected according to Akaike information criterion (AIC) for a considerable number of genes. For less noisy cases the popular sigmoid model is frequently selected. For more noisy data, often simpler models like the linear model are selected, but mostly without relevant performance advantage compared to the second best model. Also, the commonly used standard model has an unexpected low performance.
我们将 MCP-Mod(多重比较程序和建模)的应用范围扩展到体外基因表达数据,并评估其在浓度基因表达曲线模型选择方面的特性。具体来说,我们将 MCP-Mod 应用于高维基因表达数据集的单个基因,其中人类胚胎干细胞暴露于八种浓度水平的化合物丙戊酸(VPA)。作为候选模型,我们考虑了 sigmoid (四参数对数逻辑)、线性、二次、指数和 beta 模型。通过模拟,我们研究了从候选模型集中省略一个或多个模型的影响,以揭示可能多余的模型,并评估所选模型的精度和召回率。根据 Akaike 信息准则(AIC),为相当数量的基因选择每个模型。在噪声较小的情况下,常用的 sigmoid 模型经常被选中。对于噪声较大的数据,通常选择更简单的模型,如线性模型,但与第二好的模型相比,通常没有相关的性能优势。此外,常用的标准 模型的性能出人意料地低。