将数据挖掘技术应用于复杂疾病基因的定位。

Applying data mining techniques to the mapping of complex disease genes.

作者信息

Czika W A, Weir B S, Edwards S R, Thompson R W, Nielsen D M, Brocklebank J C, Zinkus C, Martin E R, Hobler K E

机构信息

SAS Insitute, SAS Campus Drive, Carey, NC 27513, USA.

出版信息

Genet Epidemiol. 2001;21 Suppl 1:S435-40. doi: 10.1002/gepi.2001.21.s1.s435.

DOI:10.1002/gepi.2001.21.s1.s435

PMID:11793714

Abstract

The simulated sequence data for the Genetic Analysis Workshop 12 were analyzed using data mining techniques provided by SAS ENTERPRISE MINER Release 4.0 in addition to traditional statistical tests for linkage and association of genetic markers with disease status. We examined two ways of combining these approaches to make use of the covariate data along with the genotypic data. The result of incorporating data mining techniques with more classical methods is an improvement in the analysis, both by correctly classifying the affection status of more individuals and by locating more single nucleotide polymorphisms related to the disease, relative to analyses that use classical methods alone.

摘要

除了对遗传标记与疾病状态进行连锁和关联的传统统计检验外，我们还使用SAS ENTERPRISE MINER 4.0版提供的数据挖掘技术对遗传分析研讨会12的模拟序列数据进行了分析。我们研究了两种结合这些方法的方式，以便在利用基因型数据的同时利用协变量数据。相对于仅使用经典方法的分析，将数据挖掘技术与更经典的方法相结合的结果是分析得到了改进，这体现在正确分类更多个体的患病状态以及定位更多与疾病相关的单核苷酸多态性方面。