Suppr超能文献

在开发精神科筛查工具的背景下,对变量选择方法的比较研究。

A comparative study of variable selection methods in the context of developing psychiatric screening instruments.

机构信息

Department of Statistics, Columbia University, New York, NY 10027, U.S.A.

出版信息

Stat Med. 2014 Feb 10;33(3):401-21. doi: 10.1002/sim.5937. Epub 2013 Aug 11.

Abstract

The development of screening instruments for psychiatric disorders involves item selection from a pool of items in existing questionnaires assessing clinical and behavioral phenotypes. A screening instrument should consist of only a few items and have good accuracy in classifying cases and non-cases. Variable/item selection methods such as Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net, Classification and Regression Tree, Random Forest, and the two-sample t-test can be used in such context. Unlike situations where variable selection methods are most commonly applied (e.g., ultra high-dimensional genetic or imaging data), psychiatric data usually have lower dimensions and are characterized by the following factors: correlations and possible interactions among predictors, unobservability of important variables (i.e., true variables not measured by available questionnaires), amount and pattern of missing values in the predictors, and prevalence of cases in the training data. We investigate how these factors affect the performance of several variable selection methods and compare them with respect to selection performance and prediction error rate via simulations. Our results demonstrated that: (1) for complete data, LASSO and Elastic Net outperformed other methods with respect to variable selection and future data prediction, and (2) for certain types of incomplete data, Random Forest induced bias in imputation, leading to incorrect ranking of variable importance. We propose the Imputed-LASSO combining Random Forest imputation and LASSO; this approach offsets the bias in Random Forest and offers a simple yet efficient item selection approach for missing data. As an illustration, we apply the methods to items from the standard Autism Diagnostic Interview-Revised version.

摘要

精神障碍筛查工具的开发涉及从评估临床和行为表型的现有问卷中选择项目池中的项目。筛查工具应仅包含几个项目,并且在对病例和非病例进行分类时具有良好的准确性。可以在这种情况下使用变量/项目选择方法,如最小绝对收缩和选择算子(LASSO)、弹性网络、分类和回归树、随机森林和双样本 t 检验。与变量选择方法最常应用的情况(例如超高维遗传或成像数据)不同,精神障碍数据通常维度较低,具有以下特征:预测器之间的相关性和可能的相互作用、重要变量的不可观测性(即无法通过现有问卷测量的真实变量)、预测器中缺失值的数量和模式以及训练数据中病例的流行率。我们研究了这些因素如何影响几种变量选择方法的性能,并通过模拟比较了它们在选择性能和预测误差率方面的表现。我们的结果表明:(1)对于完整数据,LASSO 和弹性网络在变量选择和未来数据预测方面优于其他方法,(2)对于某些类型的不完整数据,随机森林会导致插补中的偏差,从而导致变量重要性的错误排序。我们提出了结合随机森林插补和 LASSO 的 Imputed-LASSO;这种方法抵消了随机森林的偏差,并为缺失数据提供了一种简单而有效的项目选择方法。作为说明,我们将这些方法应用于标准自闭症诊断访谈修订版的项目。

相似文献

8
10
MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.

引用本文的文献

本文引用的文献

4
6
A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。
Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验