Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.
Division of Genetics and Genomics, Boston Children's Hospital and Harvard Medical School, Boston, MA 02115, USA.
Bioinformatics. 2021 May 23;37(8):1045-1051. doi: 10.1093/bioinformatics/btaa923.
Hi-C is a common technique for assessing 3D chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline Hi-C-based TE analyzer (HiTea) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole-genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE-insertion landscape. We employ the pipeline to identify TE-insertions from human cell-line Hi-C samples.
HiTea is available at https://github.com/parklab/HiTea and as a Docker image.
Supplementary data are available at Bioinformatics online.
Hi-C 是一种常用于评估 3D 染色质构象的常用技术。最近的研究表明,Hi-C 数据中的长程相互作用信息可用于生成染色体长度的基因组组装,并识别大规模结构变异。在这里,我们展示了使用 Hi-C 数据在全基因组范围内检测可移动转座元件 (TE) 插入。我们的基于 Hi-C 的 TE 分析器 (HiTea) 利用了被截断的 Hi-C 读取,并借助 Hi-C 数据中大量的不一致读取对来检测三种活跃的人类 TE 家族的插入。尽管 Hi-C 数据中的基因组覆盖不均匀,但 HiTea 与基于全基因组测序 (WGS) 数据的现有调用程序具有竞争力,并可补充基于 WGS 的 TE 插入景观特征描述。我们使用该流水线从人类细胞系 Hi-C 样本中识别 TE 插入。
HiTea 可在 https://github.com/parklab/HiTea 上获得,并作为 Docker 映像提供。
补充数据可在生物信息学在线获得。