Morin Alexander M, Gatev Evan, McEwen Lisa M, MacIsaac Julia L, Lin David T S, Koen Nastassja, Czamara Darina, Räikkönen Katri, Zar Heather J, Koenen Karestan, Stein Dan J, Kobor Michael S, Jones Meaghan J
Centre for Molecular Medicine and Therapeutics, BC Children's Hospital, Department of Medical Genetics, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4 Canada.
Department of Psychiatry and Mental Health, South African Medical Research Council (SAMRC) Unit on Anxiety and Stress Disorders, University of Cape Town, Groote Schuur Hospital, J2, Anzio Road, Observatory, Cape Town, South Africa.
Clin Epigenetics. 2017 Jul 25;9:75. doi: 10.1186/s13148-017-0370-2. eCollection 2017.
Cord blood is a commonly used tissue in environmental, genetic, and epigenetic population studies due to its ready availability and potential to inform on a sensitive period of human development. However, the introduction of maternal blood during labor or cross-contamination during sample collection may complicate downstream analyses. After discovering maternal contamination of cord blood in a cohort study of 150 neonates using Illumina 450K DNA methylation (DNAm) data, we used a combination of linear regression and random forest machine learning to create a DNAm-based screening method. We identified a panel of DNAm sites that could discriminate between contaminated and non-contaminated samples, then designed pyrosequencing assays to pre-screen DNA prior to being assayed on an array.
Maternal contamination of cord blood was initially identified by unusual X chromosome DNA methylation patterns in 17 males. We utilized our DNAm panel to detect contaminated male samples and a proportional amount of female samples in the same cohort. We validated our DNAm screening method on an additional 189 sample cohort using both pyrosequencing and DNAm arrays, as well as 9 publically available cord blood 450K data sets. The rate of contamination varied from 0 to 10% within these studies, likely related to collection specific methods.
Maternal blood can contaminate cord blood during sample collection at appreciable levels across multiple studies. We have identified a panel of markers that can be used to identify this contamination, either post hoc after DNAm arrays have been completed, or in advance using a targeted technique like pyrosequencing.
脐血是环境、遗传和表观遗传人群研究中常用的组织,因为它易于获取,并且有可能为人类发育的敏感时期提供信息。然而,分娩过程中母血的混入或样本采集过程中的交叉污染可能会使下游分析变得复杂。在一项对150名新生儿的队列研究中,利用Illumina 450K DNA甲基化(DNAm)数据发现脐血存在母血污染后,我们结合线性回归和随机森林机器学习创建了一种基于DNAm的筛查方法。我们确定了一组DNAm位点,可区分受污染和未受污染的样本,然后设计焦磷酸测序分析方法,在进行阵列检测之前对DNA进行预筛查。
最初通过17名男性异常的X染色体DNA甲基化模式确定了脐血的母血污染。我们利用我们的DNAm检测板在同一队列中检测受污染的男性样本和一定比例的女性样本。我们使用焦磷酸测序和DNAm阵列以及9个公开可用的脐血450K数据集,在另外189个样本队列中验证了我们的DNAm筛查方法。在这些研究中,污染率从0%到10%不等,可能与采集的具体方法有关。
在多项研究中,母血在样本采集过程中可能会以相当高的水平污染脐血。我们已经确定了一组标志物,可用于识别这种污染,既可以在DNAm阵列完成后事后识别,也可以使用焦磷酸测序等靶向技术提前识别。