Thomas Duncan C, Yang Zhao, Yang Fan
Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA.
Front Genet. 2013 Dec 13;4:276. doi: 10.3389/fgene.2013.00276.
The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes. Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy. While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
新一代测序的成本目前正在接近早期全基因组关联研究(GWAS)面板的成本,但对于大型流行病学研究来说仍然遥不可及,而且数以百万计的罕见变异给区分因果变异和非因果变异带来了挑战。我们回顾了两种测序研究设计:一种是针对使用无关个体的全基因组关联研究进行靶向随访的两阶段设计;另一种是利用共分离来对变异和基因进行优先级排序的基于家系的设计。两阶段设计是从一个更大的病例对照研究中,根据疾病和携带者状态共同对受试者进行测序抽样;然后在母研究中对发现的变异进行关联测试。该分析将子研究的完整序列数据与主研究中更有限的单核苷酸多态性(SNP)数据相结合。我们讨论了选择变异子集的各种方法,并在一项正在进行的放疗后二次乳腺癌研究的背景下描述了真阳性关联的预期产出。虽然家系内变异的共享意味着基于家系的设计在发现变异方面比测序无关个体效率更低,但利用家系内变异与疾病的共分离能力有助于区分因果变异和非因果变异。此外,通过富集家族史,可以提高因果变异的产出,并且利用同源性信息可以改善其他家庭成员基因型的推断。我们比较了这些设计与使用无关个体在更大规模研究中发现变异或基因并对其进行优先级排序以测试关联的相对效率。虽然可以对单个变异进行关联测试,但对于罕见变异来说功效较低。最近将基因水平关联的负担或核检验推广到家系数据的方法很有吸引力。这些方法在一项基于家系的结直肠癌研究中得到了说明。