Suppr超能文献

Sim3C:Hi-C 和 Meta3C 邻近连接测序技术的模拟。

Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies.

机构信息

The ithree institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2077, Australia.

出版信息

Gigascience. 2018 Feb 1;7(2):1-12. doi: 10.1093/gigascience/gix103.

Abstract

BACKGROUND

Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype.

FINDINGS

We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error.

CONCLUSIONS

We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.

摘要

背景

染色体构象捕获(3C)和 Hi-C DNA 测序方法迅速提高了我们对基因组和宏基因组空间组织的理解。这些协议的许多变体已经被开发出来,每种变体都有其自身的优势。目前,没有系统的方法可以模拟来自这一系列测序协议的序列数据,这可能会阻碍利用这种新型数据类型的算法的发展。

发现

我们描述了一种计算模拟器,它可以根据简单的参数和参考基因组序列,对这些序列进行 Hi-C 测序模拟。该模拟器模拟了在 Hi-C 和 3C 数据集中常见的基因组基本空间结构,包括邻近连接中的距离衰减关系、染色体内和染色体间相互作用频率的差异,以及细胞施加的结构。提供了一种用于模拟随机生成的拓扑关联域的 3D 结构的方法。该模拟器考虑了 3C 和 Hi-C 文库制备和测序方法中常见的几种误差源,包括虚假邻近连接事件和测序误差。

结论

我们引入了第一个用于 3C 和 Hi-C 测序协议的综合模拟器。我们期望该模拟器能够用于测试 Hi-C 数据分析算法,以及更一般的实验设计,在实验设计中,可以提前提出诸如测序深度、酶选择等问题,以便在实验假设检验方面确保足够的统计能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c47/5827349/6d120ffd6189/gix103fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验