Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain.
Health and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain.
Gigascience. 2022 Oct 7;11. doi: 10.1093/gigascience/giac088.
Recent technological developments have made genome sequencing and assembly highly accessible and widely used. However, the presence in sequenced organisms of certain genomic features such as high heterozygosity, polyploidy, aneuploidy, heterokaryosis, or extreme compositional biases can challenge current standard assembly procedures and result in highly fragmented assemblies. Hence, we hypothesized that genome databases must contain a nonnegligible fraction of low-quality assemblies that result from such type of intrinsic genomic factors.
Here we present Karyon, a Python-based toolkit that uses raw sequencing data and de novo genome assembly to assess several parameters and generate informative plots to assist in the identification of nonchanonical genomic traits. Karyon includes automated de novo genome assembly and variant calling pipelines. We tested Karyon by diagnosing 35 highly fragmented publicly available assemblies from 19 different Mucorales (Fungi) species.
Our results show that 10 (28.57%) of the assemblies presented signs of unusual genomic configurations, suggesting that these are common, at least for some lineages within the Fungi.
最近的技术发展使得基因组测序和组装变得高度普及和广泛应用。然而,在测序生物中存在某些基因组特征,如高度杂合性、多倍体、非整倍体、异核体或极端组成性偏差,这可能会挑战当前的标准组装程序,并导致高度碎片化的组装。因此,我们假设基因组数据库中必须包含一部分由这些内在基因组因素导致的低质量组装。
在这里,我们提出了 Karyon,这是一个基于 Python 的工具包,它使用原始测序数据和从头基因组组装来评估几个参数,并生成信息丰富的图来帮助识别非典型基因组特征。Karyon 包括自动化的从头基因组组装和变体调用管道。我们通过诊断来自 19 种不同毛霉目(真菌)物种的 35 个高度碎片化的公开可用组装来测试 Karyon。
我们的结果表明,10 个(28.57%)组装呈现出异常基因组结构的迹象,这表明这些结构至少在真菌的某些谱系中很常见。