Suppr超能文献

构建遗传分析工作坊 14 模拟数据模型:复杂表型的基因型-表型关系、基因相互作用、连锁、关联、不平衡和确定效应。

Construction of the model for the Genetic Analysis Workshop 14 simulated data: genotype-phenotype relationships, gene interaction, linkage, association, disequilibrium, and ascertainment effects for a complex phenotype.

机构信息

Division of Statistical Genetics, Department of Biostatistics and Psychiatry, Mailman School of Public Health, Columbia-Presbyterian Medical Center, New York, NY 10032, USA.

出版信息

BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2156-6-S1-S3.

Abstract

The Genetic Analysis Workshop 14 simulated dataset was designed 1) To test the ability to find genes related to a complex disease (such as alcoholism). Such a disease may be given a variety of definitions by different investigators, have associated endophenotypes that are common in the general population, and is likely to be not one disease but a heterogeneous collection of clinically similar, but genetically distinct, entities. 2) To observe the effect on genetic analysis and gene discovery of a complex set of gene x gene interactions. 3) To allow comparison of microsatellite vs. large-scale single-nucleotide polymorphism (SNP) data. 4) To allow testing of association to identify the disease gene and the effect of moderate marker x marker linkage disequilibrium. 5) To observe the effect of different ascertainment/disease definition schemes on the analysis. Data was distributed in two forms. Data distributed to participants contained about 1,000 SNPs and 400 microsatellite markers. Internet-obtainable data consisted of a finer 10,000 SNP map, which also contained data on controls. While disease characteristics and parameters were constant, four "studies" used varying ascertainment schemes based on differing beliefs about disease characteristics. One of the studies contained multiplex two- and three-generation pedigrees with at least four affected members. The simulated disease was a psychiatric condition with many associated behaviors (endophenotypes), almost all of which were genetic in origin. The underlying disease model contained four major genes and two modifier genes. The four major genes interacted with each other to produce three different phenotypes, which were themselves heterogeneous. The population parameters were calibrated so that the major genes could be discovered by linkage analysis in most datasets. The association evidence was more difficult to calibrate but was designed to find statistically significant association in 50% of datasets. We also simulated some marker x marker linkage disequilibrium around some of the genes and also in areas without disease genes. We tried two different methods to simulate the linkage disequilibrium.

摘要

遗传分析工作坊 14 模拟数据集的设计目的是:

  1. 测试发现与复杂疾病(如酗酒)相关基因的能力。这种疾病可能会被不同的研究者用不同的定义来描述,有共同的常见表型,而且很可能不是一种疾病,而是一组临床上相似但遗传上不同的实体。

  2. 观察复杂的基因 x 基因相互作用对遗传分析和基因发现的影响。

  3. 允许比较微卫星与大规模单核苷酸多态性(SNP)数据。

  4. 允许测试关联以识别疾病基因和中等标记 x 标记连锁不平衡的影响。

  5. 观察不同的确定/疾病定义方案对分析的影响。

数据以两种形式分发。分发给参与者的数据包含约 1000 个 SNP 和 400 个微卫星标记。互联网可获得的数据由一个更精细的 10000 SNP 图谱组成,其中还包含对照数据。虽然疾病特征和参数是不变的,但四个“研究”使用了不同的确定方案,这些方案基于对疾病特征的不同信念。其中一个研究包含至少四个受影响成员的两到三代多态性家系。模拟疾病是一种具有许多相关行为(表型)的精神疾病,几乎所有这些行为都是遗传起源的。潜在的疾病模型包含四个主要基因和两个修饰基因。这四个主要基因相互作用产生了三种不同的表型,而这些表型本身就是异质的。群体参数经过校准,以便在大多数数据集的连锁分析中可以发现主要基因。关联证据更难校准,但设计目的是在 50%的数据集发现具有统计学意义的关联。我们还模拟了一些标记 x 标记连锁不平衡,这些不平衡存在于一些基因周围和没有疾病基因的区域。我们尝试了两种不同的方法来模拟连锁不平衡。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21a0/1866756/07f9a88b4673/1471-2156-6-S1-S3-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验