Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Ave NE, Seattle, WA, 98195, USA.
Department of Electrical Engineering, University of Washington, Paul Allen Center AE100R, Box 352500, 185 Stevens Way, Seattle, WA, 98195, USA.
Nat Commun. 2018 Apr 11;9(1):1402. doi: 10.1038/s41467-018-03635-9.
The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project seek to characterize the epigenome in diverse cell types using assays that identify, for example, genomic regions with modified histones or accessible chromatin. These efforts have produced thousands of datasets but cannot possibly measure each epigenomic factor in all cell types. To address this, we present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to computationally impute missing experiments. PREDICTD leverages an elegant model called "tensor decomposition" to impute many experiments simultaneously. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining the two methods yields further improvement. We show that PREDICTD data captures enhancer activity at noncoding human accelerated regions. PREDICTD provides reference imputed data and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, both promising technologies for bioinformatics.
DNA 元件百科全书(ENCODE)和表观基因组学图谱计划试图利用鉴定具有修饰组蛋白或可及染色质的基因组区域等检测方法来描述不同细胞类型中的表观基因组。这些努力产生了数千个数据集,但不可能在所有细胞类型中测量每个表观基因组因素。为了解决这个问题,我们提出了一种方法,即基于云张量分解的平行表观基因组数据推断(PREDICTD),以进行计算推断缺失的实验。PREDICTD 利用一种称为“张量分解”的优雅模型来同时推断许多实验。与当前最先进的方法 ChromImpute 相比,PREDICTD 产生的总体均方误差更低,并且将两种方法结合使用会进一步提高性能。我们表明,PREDICTD 数据可捕获非编码人类加速区域的增强子活性。PREDICTD 提供了参考推断数据和开源软件,用于研究新的细胞类型,并展示了张量分解和云计算的实用性,这两种技术都是生物信息学的有前途的技术。