Bamshad Michael J, Wooding Stephen, Watkins W Scott, Ostler Christopher T, Batzer Mark A, Jorde Lynn B
Department of Pediatrics, University of Utah, Salt Lake City, Utah 84112, USA.
Am J Hum Genet. 2003 Mar;72(3):578-89. doi: 10.1086/368061. Epub 2003 Jan 28.
A major goal of biomedical research is to develop the capability to provide highly personalized health care. To do so, it is necessary to understand the distribution of interindividual genetic variation at loci underlying physical characteristics, disease susceptibility, and response to treatment. Variation at these loci commonly exhibits geographic structuring and may contribute to phenotypic differences between groups. Thus, in some situations, it may be important to consider these groups separately. Membership in these groups is commonly inferred by use of a proxy such as place-of-origin or ethnic affiliation. These inferences are frequently weakened, however, by use of surrogates, such as skin color, for these proxies, the distribution of which bears little resemblance to the distribution of neutral genetic variation. Consequently, it has become increasingly controversial whether proxies are sufficient and accurate representations of groups inferred from neutral genetic variation. This raises three questions: how many data are required to identify population structure at a meaningful level of resolution, to what level can population structure be resolved, and do some proxies represent population structure accurately? We assayed 100 Alu insertion polymorphisms in a heterogeneous collection of approximately 565 individuals, approximately 200 of whom were also typed for 60 microsatellites. Stripped of identifying information, correct assignment to the continent of origin (Africa, Asia, or Europe) with a mean accuracy of at least 90% required a minimum of 60 Alu markers or microsatellites and reached 99%-100% when >/=100 loci were used. Less accurate assignment (87%) to the appropriate genetic cluster was possible for a historically admixed sample from southern India. These results set a minimum for the number of markers that must be tested to make strong inferences about detecting population structure among Old World populations under ideal experimental conditions. We note that, whereas some proxies correspond crudely, if at all, to population structure, the heuristic value of others is much higher. This suggests that a more flexible framework is needed for making inferences about population structure and the utility of proxies.
生物医学研究的一个主要目标是培养提供高度个性化医疗保健的能力。要做到这一点,有必要了解个体间基因变异在身体特征、疾病易感性和治疗反应相关基因座上的分布情况。这些基因座上的变异通常呈现出地理结构,可能导致群体间的表型差异。因此,在某些情况下,分别考虑这些群体可能很重要。这些群体的成员身份通常通过使用诸如出生地或种族归属等代理来推断。然而,这些推断常常因使用诸如肤色等替代物而被削弱,因为这些替代物的分布与中性基因变异的分布几乎没有相似之处。因此,代理是否足以准确代表从中性基因变异推断出的群体,这一问题已变得越来越有争议。这引发了三个问题:需要多少数据才能在有意义的分辨率水平上识别种群结构,种群结构可以解析到什么程度,以及某些代理是否能准确代表种群结构?我们在一个约565人的异质样本中检测了100个Alu插入多态性,其中约200人还进行了60个微卫星分型。去除识别信息后,以至少90%的平均准确率正确分配到起源大陆(非洲、亚洲或欧洲)至少需要60个Alu标记或微卫星,当使用≥100个基因座时准确率达到99%-100%。对于来自印度南部的一个历史上混合的样本,以87%的准确率分配到合适的基因簇也是可能做到的。这些结果为在理想实验条件下对旧世界人群中检测种群结构进行有力推断所需测试的标记数量设定了最小值。我们注意到,虽然一些代理与种群结构的对应关系很粗略(如果有的话),但其他一些代理的启发价值要高得多。这表明需要一个更灵活的框架来推断种群结构和代理的效用。