Suppr超能文献

染色质环锚点可预测转录本和外显子的使用。

Chromatin loop anchors predict transcript and exon usage.

机构信息

School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.

Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore 117599, Singapore.

出版信息

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab254.

Abstract

Epigenomics and transcriptomics data from high-throughput sequencing techniques such as RNA-seq and ChIP-seq have been successfully applied in predicting gene transcript expression. However, the locations of chromatin loops in the genome identified by techniques such as Chromatin Interaction Analysis with Paired End Tag sequencing (ChIA-PET) have never been used for prediction tasks. Here, we developed machine learning models to investigate if ChIA-PET could contribute to transcript and exon usage prediction. In doing so, we used a large set of transcription factors as well as ChIA-PET data. We developed different Gradient Boosting Trees models according to the different tasks with the integrated datasets from three cell lines, including GM12878, HeLaS3 and K562. We validated the models via 10-fold cross validation, chromosome-split validation and cross-cell validation. Our results show that both transcript and splicing-derived exon usage can be effectively predicted with at least 0.7512 and 0.7459 of accuracy, respectively, on all cell lines from all kinds of validations. Examining the predictive features, we found that RNA Polymerase II ChIA-PET was one of the most important features in both transcript and exon usage prediction, suggesting that chromatin loop anchors are predictive of both transcript and exon usage.

摘要

基于高通量测序技术(如 RNA-seq 和 ChIP-seq)的表观基因组学和转录组学数据已成功应用于预测基因转录本的表达。然而,通过 Chromatin Interaction Analysis with Paired End Tag sequencing (ChIA-PET) 等技术确定的染色质环在基因组中的位置从未用于预测任务。在这里,我们开发了机器学习模型来研究 ChIA-PET 是否可以有助于转录本和外显子使用预测。为此,我们使用了大量的转录因子以及 ChIA-PET 数据。我们根据不同的任务和三个细胞系(GM12878、HeLaS3 和 K562)的综合数据集开发了不同的梯度提升树模型。我们通过 10 折交叉验证、染色体拆分验证和跨细胞验证对模型进行了验证。我们的结果表明,在所有细胞系的各种验证中,至少可以分别以 0.7512 和 0.7459 的准确度有效地预测转录本和剪接衍生的外显子使用情况。检查预测特征时,我们发现 RNA 聚合酶 II ChIA-PET 是转录本和外显子使用预测中最重要的特征之一,这表明染色质环锚点可预测转录本和外显子使用情况。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验