Suppr超能文献

qc3C:Hi-C 测序数据的无参质量控制。

qc3C: Reference-free quality control for Hi-C sequencing data.

机构信息

The iThree Institute, University of Technology Sydney, Ultimo, NSW, Australia.

出版信息

PLoS Comput Biol. 2021 Oct 11;17(10):e1008839. doi: 10.1371/journal.pcbi.1008839. eCollection 2021 Oct.

Abstract

Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have-thus far-relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.

摘要

Hi-C 是一种样品制备方法,可实现高通量测序以捕获 DNA 分子之间的全基因组空间相互作用。该技术已成功应用于解决挑战性问题,例如染色质的 3D 结构分析、大基因组组装的支架构建,以及最近对宏基因组组装基因组(MAG)的精确分辨率。然而,尽管不断进行改进,但制备 Hi-C 文库仍然是一个复杂的实验室方案。为了避免昂贵的失败并最大限度地提高成功的机会,建议进行严格的质量管理。当前的湿实验室方法仅对 Hi-C 文库质量进行粗略的检测,而关键的测序后质量指标-迄今为止-依赖于基于参考的读映射。当可以访问参考时,这种依赖关系会引起对质量的关注,其中不完整或不准确的参考会扭曲得出的质量指标。我们提出了一种新的、无参考的方法,该方法可以推断出接近性连接产物的读对的总分数。这种 Hi-C 文库质量的定量仅需要少量测序数据,并且与其他特定于应用的标准无关。该算法基于这样的观察结果,即接近性连接事件很可能产生在样品中不会自然发生的 k-mer。我们的软件工具(qc3C)是我们所知的第一个实现无参考 Hi-C QC 工具的工具,它还提供基于参考的 QC,使 Hi-C 更容易应用于非模型生物和环境样本。我们在模拟和真实数据集上对新算法的准确性进行了特征描述,并将其与基于参考的方法进行了比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ab8/8530316/77923c3e4558/pcbi.1008839.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验