BC Children's Hospital Research Institute, 950 W 28th Ave, Vancouver, BC, V6H 3N1, Canada.
Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1, Canada.
Epigenetics Chromatin. 2023 Jan 7;16(1):1. doi: 10.1186/s13072-022-00477-0.
Many human disease phenotypes manifest differently by sex, making the development of methods for incorporating X and Y-chromosome data into analyses vital. Unfortunately, X and Y chromosome data are frequently excluded from large-scale analyses of the human genome and epigenome due to analytical complexity associated with sex chromosome dosage differences between XX and XY individuals, and the impact of X-chromosome inactivation (XCI) on the epigenome. As such, little attention has been given to considering the methods by which sex chromosome data may be included in analyses of DNA methylation (DNAme) array data.
With Illumina Infinium HumanMethylation450 DNAme array data from 634 placental samples, we investigated the effects of probe filtering, normalization, and batch correction on DNAme data from the X and Y chromosomes. Processing steps were evaluated in both mixed-sex and sex-stratified subsets of the analysis cohort to identify whether including both sexes impacted processing results. We found that identification of probes that have a high detection p-value, or that are non-variable, should be performed in sex-stratified data subsets to avoid over- and under-estimation of the quantity of probes eligible for removal, respectively. All normalization techniques investigated returned X and Y DNAme data that were highly correlated with the raw data from the same samples. We found no difference in batch correction results after application to mixed-sex or sex-stratified cohorts. Additionally, we identify two analytical methods suitable for XY chromosome data, the choice between which should be guided by the research question of interest, and we performed a proof-of-concept analysis studying differential DNAme on the X and Y chromosome in the context of placental acute chorioamnionitis. Finally, we provide an annotation of probe types that may be desirable to filter in X and Y chromosome analyses, including probes in repetitive elements, the X-transposed region, and cancer-testis gene promoters.
While there may be no single "best" approach for analyzing DNAme array data from the X and Y chromosome, analysts must consider key factors during processing and analysis of sex chromosome data to accommodate the underlying biology of these chromosomes, and the technical limitations of DNA methylation arrays.
许多人类疾病表型在性别上表现不同,因此开发将 X 和 Y 染色体数据纳入分析的方法至关重要。不幸的是,由于 XX 和 XY 个体之间性染色体剂量差异相关的分析复杂性,以及 X 染色体失活 (XCI) 对表观基因组的影响,X 和 Y 染色体数据经常被排除在人类基因组和表观基因组的大规模分析之外。因此,很少有人关注考虑如何将性染色体数据纳入 DNA 甲基化 (DNAme) 阵列数据分析中。
我们使用来自 634 个胎盘样本的 Illumina Infinium HumanMethylation450 DNAme 阵列数据,研究了探针过滤、标准化和批次校正对 X 和 Y 染色体上的 DNAme 数据的影响。在分析队列的混合性别和性别分层子集中评估处理步骤,以确定是否包括两种性别会影响处理结果。我们发现,在性别分层数据子集中,应该对具有高检测 p 值或非变量的探针进行识别,以分别避免对可去除探针数量的过高和过低估计。所有研究的标准化技术都返回了与来自同一样本的原始数据高度相关的 X 和 Y DNAme 数据。我们发现,在应用于混合性别或性别分层队列后,批次校正结果没有差异。此外,我们确定了两种适用于 XY 染色体数据的分析方法,在选择时应根据感兴趣的研究问题来指导,并且我们进行了一项概念验证分析,研究了胎盘急性绒毛膜羊膜炎背景下 X 和 Y 染色体上的差异 DNAme。最后,我们提供了一个探针类型注释,可能需要在 X 和 Y 染色体分析中过滤,包括重复元件、X 转位区和癌症睾丸基因启动子中的探针。
虽然分析 X 和 Y 染色体上的 DNAme 阵列数据可能没有单一的“最佳”方法,但分析人员在处理和分析性染色体数据时必须考虑关键因素,以适应这些染色体的基础生物学和 DNA 甲基化阵列的技术限制。