Koko Mahmoud, Fabian Laurie, Popov Iaroslav, Eberhardt Ruth Y, Zakharov Gennadii, Huang Qin Qin, Wade Emma E, Azad Rafaq, Danecek Petr, Ho Karen, Hough Amy, Huang Wei, Lindsay Sarah J, Malawsky Daniel S, Bonfanti Davide, Mason Dan, Plowman Deborah, Quail Michael A, Ring Susan M, Shireby Gemma, Widaa Sara, Fitzsimons Emla, Iyer Vivek, Bann David, Timpson Nicholas J, Wright John, Hurles Matthew E, Martin Hilary C
Human Genetics, Wellcome Sanger Institute, Hinxton, England, CB10 1SA, UK.
Population Health Sciences, University of Bristol Medical School, Bristol, England, BS8 2BN, UK.
Wellcome Open Res. 2024 Dec 5;9:390. doi: 10.12688/wellcomeopenres.22697.2. eCollection 2024.
Birth cohort studies involve repeated surveys of large numbers of individuals from birth and throughout their lives. They collect information useful for a wide range of life course research domains, and biological samples which can be used to derive data from an increasing collection of omic technologies. This rich source of longitudinal data, when combined with genomic data, offers the scientific community valuable insights ranging from population genetics to applications across the social sciences. Here we present quality-controlled whole exome sequencing data from three UK birth cohorts: the Avon Longitudinal Study of Parents and Children (8,436 children and 3,215 parents), the Millenium Cohort Study (7,667 children and 6,925 parents) and Born in Bradford (8,784 children and 2,875 parents). The overall objective of this coordinated effort is to make the resulting high-quality data widely accessible to the global research community in a timely manner. We describe how the datasets were generated and subjected to quality control at the sample, variant and genotype level. We then present some preliminary analyses to illustrate the quality of the datasets and probe potential sources of bias. We introduce measures of ultra-rare variant burden to the variables available for researchers working on these cohorts, and show that the exome-wide burden of deleterious protein-truncating variants, burden, is associated with educational attainment and cognitive test scores. The whole exome sequence data from these birth cohorts (CRAM & VCF files) are available through the European Genome-Phenome Archive, and here we provide guidance for their use.
出生队列研究涉及对大量个体从出生到其一生的反复调查。它们收集了对广泛的生命历程研究领域有用的信息,以及可用于从越来越多的组学技术中获取数据的生物样本。这种丰富的纵向数据来源,与基因组数据相结合,为科学界提供了从群体遗传学到社会科学应用等有价值的见解。在这里,我们展示了来自三个英国出生队列的经过质量控制的全外显子组测序数据:埃文父母与儿童纵向研究(8436名儿童和3215名父母)、千禧队列研究(7667名儿童和6925名父母)以及布拉德福德出生队列(8784名儿童和2875名父母)。这项协同工作的总体目标是及时让全球研究界广泛获取由此产生的高质量数据。我们描述了数据集是如何生成的,以及在样本、变异和基因型层面是如何进行质量控制的。然后,我们展示了一些初步分析,以说明数据集的质量并探究潜在的偏差来源。我们为研究这些队列的研究人员可利用的变量引入了超罕见变异负担的衡量标准,并表明有害蛋白质截短变异的全外显子组负担与受教育程度和认知测试分数相关。这些出生队列的全外显子组序列数据(CRAM和VCF文件)可通过欧洲基因组-表型存档库获取,在此我们提供了使用这些数据的指南。