Daw E Warwick, Morrison John, Zhou Xiaojun, Thomas Duncan C
Department of Epidemiology, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA.
BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2156-4-S1-S3.
The Genetic Analysis Workshop 13 simulated data aimed to mimic the major features of the real Framingham Heart Study data that formed Problem 1, but under a known inheritance model and with 100 replicates, so as to allow evaluation of the statistical properties of various methods. The pedigrees used were the 330 real pedigree structures (comprising 4692 individuals) with some minor changes to protect confidentiality. Fifty trait genes and 399 microsatellite markers were simulated by gene dropping on 22 autosomal chromosomes. Assuming random ascertainment of families, a system of eight longitudinal quantitative traits (designed to be similar to those in the real data) was generated with a wide range of heritabilities, including some pleiotropic and interactive effects. Genes could affect either the baseline level or the rate of change of the phenotype. Hypertension diagnosis and treatment were simulated with treatment availability, compliance, and efficacy depending on calendar year. Nongenetic traits of smoking and alcohol were generated as covariates for other traits. Death was simulated as a hazard rate depending upon age, sex, smoking, cholesterol, and systolic blood pressure. After the complete data were simulated, missing data indicators were generated based on logistic models fitted to the real data, involving the subject's history of previous missing values, together with that of their spouses, parents, siblings, and offspring, as well as marital status, only-child indicators, current value at certain simulated traits, and the data collection pattern on the cohort into which each subject was ascertained.
遗传分析研讨会13的模拟数据旨在模仿构成问题1的真实弗雷明汉心脏研究数据的主要特征,但基于已知的遗传模型且有100次重复,以便评估各种方法的统计特性。所使用的系谱是330个真实的系谱结构(包含4692个个体),并做了一些细微改动以保护隐私。通过在22条常染色体上进行基因投放,模拟了50个性状基因和399个微卫星标记。假设对家庭进行随机确定,生成了一个包含八个纵向定量性状的系统(设计得与真实数据中的性状相似),具有广泛的遗传力,包括一些多效性和交互作用效应。基因可以影响表型的基线水平或变化率。根据日历年模拟了高血压的诊断和治疗情况,治疗的可获得性、依从性和疗效各不相同。将吸烟和饮酒的非遗传性状作为其他性状的协变量生成。根据年龄、性别、吸烟、胆固醇和收缩压模拟死亡风险率。在模拟出完整数据后,基于对真实数据拟合的逻辑模型生成缺失数据指标,这些模型涉及受试者先前缺失值的历史记录,以及其配偶、父母、兄弟姐妹和后代的缺失值历史记录,还有婚姻状况、独生子女指标、某些模拟性状的当前值以及确定每个受试者所属队列的数据收集模式。