Center for Biomedical Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Genomics. 2012 Apr;99(4):209-19. doi: 10.1016/j.ygeno.2012.01.002. Epub 2012 Jan 15.
Lymphoblastoid cell line (LCL) is a common tool to study genetic disorders. However, it has not been fully characterized to what degree LCLs preserve the in vivo status of non-genetic biological systems, such as DNA methylation and gene transcription. We previously reported that DNA methylation in LCLs is highly variable in a data set of ~27,000 CpG dinucleotide sites around transcription start site (TSS) and 63 human subjects including healthy controls and probands of genetic disorders. Disease-causing mutations are linked to differential methylation at some CpG sites, but account for a small proportion of the total variance. In this study, we repeated the experiments to ensure that the high variance is not due to technical error and scrutinized the characteristics of DNA methylation and its association with other biological systems. Using sequence information and ChIP-seq data, we conclude that local CpG density and histone modifications not only correlate to baseline methylation level, but also affect the direction of methylation change in LCLs. Integrative analysis of gene transcription and DNA methylation data of the same subjects shows that medium or high methylation around TSS blocks the transcription while low methylation is a necessary, but not sufficient condition of downstream gene transcription. We utilized epigenetic information around TSS to predict active gene transcription via logistic regression models. The multivariate model using DNA methylation, eight histone modifications, and two regulatory protein complexes (CTCF and cohesin) as predictors has better performance (accuracy=95.1%) than any univariate models of single predictors. Linear regression analysis further shows that the transcriptional levels predicted by epigenetic markers have significant correlation to microarray measurements (p=2.2e-10). This study provides new insights into the epigenetic systems of LCLs and suggests that more specifically designed experiments are needed to improve our understanding on this topic.
淋巴母细胞系(LCL)是研究遗传疾病的常用工具。然而,它在多大程度上保留了非遗传生物系统的体内状态,如 DNA 甲基化和基因转录,尚未得到充分的描述。我们之前报道过,在大约 27000 个转录起始位点(TSS)附近的 CpG 二核苷酸位点和 63 个人类样本(包括健康对照和遗传疾病的先证者)的数据集,LCL 中的 DNA 甲基化高度可变。致病突变与一些 CpG 位点的差异甲基化有关,但仅占总方差的一小部分。在这项研究中,我们重复了实验以确保高变异性不是由于技术误差造成的,并仔细研究了 DNA 甲基化的特征及其与其他生物系统的关联。利用序列信息和 ChIP-seq 数据,我们得出结论,局部 CpG 密度和组蛋白修饰不仅与基础甲基化水平相关,而且还影响 LCL 中甲基化变化的方向。对相同个体的基因转录和 DNA 甲基化数据进行综合分析表明,TSS 附近的中或高甲基化会阻止转录,而低甲基化是下游基因转录的必要条件,但不是充分条件。我们利用 TSS 周围的表观遗传信息,通过逻辑回归模型预测活性基因转录。使用 DNA 甲基化、八种组蛋白修饰和两个调节蛋白复合物(CTCF 和 cohesin)作为预测因子的多变量模型比任何单一预测因子的单变量模型具有更好的性能(准确率=95.1%)。线性回归分析进一步表明,表观遗传标记预测的转录水平与微阵列测量值有显著相关性(p=2.2e-10)。这项研究为 LCL 的表观遗传系统提供了新的见解,并表明需要更有针对性的实验来提高我们对这一主题的理解。