Song Wei, Li Chong, Lu Yanming, Shen Dawei, Jia Yunxiao, Huo Yixin, Piao Weilan, Jin Hua
Laboratory of Genetics and Disorders, Key Laboratory of Molecular Medicine and Biotherapy, Aerospace Center Hospital, School of Life Science, Beijing Institute of Technology, Beijing, China.
Research Institute for Science and Technology, Beijing Institute of Technology, Beijing, China.
Front Plant Sci. 2024 Aug 27;15:1430443. doi: 10.3389/fpls.2024.1430443. eCollection 2024.
Accurate reference genomes are fundamental to understanding biological evolution, biodiversity, hereditary phenomena and diseases. However, many assembled nuclear chromosomes are often contaminated by organelle genomes, which will mislead bioinformatic analysis, and genomic and transcriptomic data interpretation.
To address this issue, we developed a tool named Chlomito, aiming at precise identification and elimination of organelle genome contamination from nuclear genome assembly. Compared to conventional approaches, Chlomito utilized new metrics, alignment length coverage ratio (ALCR) and sequencing depth ratio (SDR), thereby effectively distinguishing true organelle genome sequences from those transferred into nuclear genomes via horizontal gene transfer (HGT).
The accuracy of Chlomito was tested using sequencing data from Plum, Mango and . The results confirmed that Chlomito can accurately detect contigs originating from the organelle genomes, and the identified contigs covered most regions of the organelle reference genomes, demonstrating efficiency and precision of Chlomito. Considering user convenience, we further packaged this method into a Docker image, simplified the data processing workflow.
Overall, Chlomito provides an efficient, accurate and convenient method for identifying and removing contigs derived from organelle genomes in genomic assembly data, contributing to the improvement of genome assembly quality.
准确的参考基因组对于理解生物进化、生物多样性、遗传现象和疾病至关重要。然而,许多组装的核染色体经常被细胞器基因组污染,这会误导生物信息学分析以及基因组和转录组数据的解读。
为解决这一问题,我们开发了一种名为Chlomito的工具,旨在精确识别并消除核基因组组装中的细胞器基因组污染。与传统方法相比,Chlomito使用了新的指标,即比对长度覆盖率(ALCR)和测序深度比(SDR),从而有效地将真正的细胞器基因组序列与那些通过水平基因转移(HGT)转移到核基因组中的序列区分开来。
使用来自李子、芒果等的测序数据对Chlomito的准确性进行了测试。结果证实,Chlomito能够准确检测出来自细胞器基因组的重叠群,并且所识别的重叠群覆盖了细胞器参考基因组的大部分区域,证明了Chlomito的效率和准确性。考虑到用户的便利性,我们进一步将此方法打包成一个Docker镜像,简化了数据处理流程。
总体而言,Chlomito为识别和去除基因组组装数据中源自细胞器基因组的重叠群提供了一种高效、准确且便捷的方法,有助于提高基因组组装质量。