Fontanarosa Joel B, Dai Yang
Bioinformatics Program, Department of Bioengineering (MC 063), University of Illinois at Chicago, 851 S, Morgan Street, 218 SEO, Chicago, IL 60607-7052, USA.
BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S69. doi: 10.1186/1753-6561-5-S9-S69.
We use least absolute shrinkage and selection operator (LASSO) regression to select genetic markers and phenotypic features that are most informative with respect to a trait of interest. We compare several strategies for applying LASSO methods in risk prediction models, using the Genetic Analysis Workshop 17 exome simulation data consisting of 697 individuals with information on genotypic and phenotypic features (smoking, age, sex) in 5-fold cross-validated fashion. The cross-validated averages of the area under the receiver operating curve range from 0.45 to 0.63 for different strategies using only genotypic markers. The same values are improved to 0.69-0.87 when both genotypic and phenotypic information are used. The ability of the LASSO method to find true causal markers is limited, but the method was able to discover several common variants (e.g., FLT1) under certain conditions.
我们使用最小绝对收缩和选择算子(LASSO)回归来选择与感兴趣的性状最相关的遗传标记和表型特征。我们使用遗传分析研讨会17的外显子组模拟数据(包含697名个体的基因型和表型特征信息,如吸烟、年龄、性别),以5折交叉验证的方式比较了在风险预测模型中应用LASSO方法的几种策略。仅使用基因型标记的不同策略,其受试者工作特征曲线下面积的交叉验证平均值在0.45至0.63之间。当同时使用基因型和表型信息时,相同的值提高到了0.69 - 0.87。LASSO方法找到真正因果标记的能力有限,但该方法在某些条件下能够发现一些常见变异(如FLT1)。