Suppr超能文献

iGDP:一种用于野生纤毛微型真核生物的基因组污染综合去除管道。

iGDP: An integrated genome decontamination pipeline for wild ciliated microeukaryotes.

机构信息

Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.

University of Chinese Academy of Sciences, Beijing, China.

出版信息

Mol Ecol Resour. 2023 Jul;23(5):1182-1193. doi: 10.1111/1755-0998.13782. Epub 2023 Mar 22.

Abstract

Ciliates are a large group of ubiquitous and highly diverse single-celled eukaryotes that play an essential role in the functioning of microbial food webs. However, their genomic diversity is far from clear due to the need to develop cultivation methods for most species, so most research is based on wild organisms that almost invariably contain contaminants. Here we establish an integrated Genome Decontamination Pipeline (iGDP) that combines homology search, telomere reads-assisted and clustering approaches to filter contaminated ciliate genome assemblies from wild specimens. We benchmarked the performance of iGDP using genomic data from a contaminated ciliate culture and the results showed that iGDP could recall 91.9% of the target sequences with 96.9% precision. We also used a synthetic dataset to offer guidelines for the application of iGDP in the removal of various groups of contaminants. Compared with several popular metagenome binning tools, iGDP could show better performance. To further validate the effectiveness of iGDP on real-world data, we applied it to decontaminate genome assemblies of three wild ciliate specimens and obtained their genomes with high quality comparable to that of previously well-studied model ciliate genomes. It is anticipated that the newly generated genomes and the established iGDP method will be valuable community resources for detailed studies on ciliate biodiversity, phylogeny, ecology and evolution. The pipeline (https://github.com/GWang2022/iGDP) can be implemented automatically to reduce manual filtering and classification and may be further developed to apply to other microeukaryotes.

摘要

纤毛虫是一大类普遍存在且高度多样化的单细胞真核生物,它们在微生物食物网的功能中起着至关重要的作用。然而,由于需要开发大多数物种的培养方法,它们的基因组多样性还远不清楚,因此大多数研究都是基于野生生物进行的,而这些野生生物几乎总是含有污染物。在这里,我们建立了一个综合基因组去污染管道(iGDP),该管道结合了同源搜索、端粒reads 辅助和聚类方法,从野生标本中过滤污染的纤毛虫基因组组装。我们使用受污染的纤毛虫培养物的基因组数据来对 iGDP 的性能进行基准测试,结果表明,iGDP 可以召回 91.9%的目标序列,准确率为 96.9%。我们还使用了一个合成数据集,为 iGDP 在去除各种类型的污染物中的应用提供了指导原则。与几种流行的宏基因组 binning 工具相比,iGDP 可以表现出更好的性能。为了进一步验证 iGDP 在实际数据上的有效性,我们将其应用于三个野生纤毛虫标本的基因组去污染,并获得了高质量的基因组,其质量可与先前研究充分的模式纤毛虫基因组相媲美。预计新生成的基因组和建立的 iGDP 方法将成为详细研究纤毛虫生物多样性、系统发育、生态学和进化的有价值的社区资源。该管道(https://github.com/GWang2022/iGDP)可以自动实现,以减少手动过滤和分类,并且可以进一步开发以应用于其他微型真核生物。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验