Xie Jiazheng, Tan Bowen, Zhang Yi
Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
Animals (Basel). 2023 Jul 8;13(14):2243. doi: 10.3390/ani13142243.
With the birth of next-generation sequencing (NGS) technology, genomic data in public databases have increased exponentially. Unfortunately, exogenous contamination or intracellular parasite sequences in assemblies could confuse genomic analysis. Meanwhile, they can provide a valuable resource for studies of host-microbe interactions. Here, we used a strategy based on DNA barcodes to scan protistan contamination in the GenBank WGS/TSA database. The results showed a total of 13,952 metazoan/animal assemblies in GenBank, where 17,036 contigs were found to be protistan contaminants in 1507 assemblies (10.8%), with even higher contamination rates in taxa of Cnidaria (150/281), Crustacea (237/480), and Mollusca (107/410). Taxonomic analysis of the protists derived from these contigs showed variations in abundance and evenness of protistan contamination across different metazoan taxa, reflecting host preferences of Apicomplexa, Ciliophora, Oomycota and Symbiodiniaceae for mammals and birds, Crustacea, insects, and Cnidaria, respectively. Finally, mitochondrial proteins COX1 and CYTB were predicted from these contigs, and the phylogenetic analysis corroborated the protistan origination and heterogeneous distribution of the contaminated contigs. Overall, in this study, we conducted a large-scale scan of protistan contaminant in genomic resources, and the protistan sequences detected will help uncover the protist diversity and relationships of these picoeukaryotes with Metazoa.
随着下一代测序(NGS)技术的诞生,公共数据库中的基因组数据呈指数级增长。不幸的是,组装序列中的外源污染或细胞内寄生虫序列可能会干扰基因组分析。与此同时,它们可以为宿主-微生物相互作用的研究提供宝贵资源。在这里,我们使用了一种基于DNA条形码的策略来扫描GenBank全基因组枪击测序/全基因组鸟枪法测序(WGS/TSA)数据库中的原生生物污染。结果显示,GenBank中共有13952个后生动物/动物组装序列,其中在1507个组装序列(10.8%)中发现17036个重叠群是原生生物污染物,在刺胞动物门(150/281)、甲壳纲(237/480)和软体动物门(107/410)的分类群中污染率更高。对源自这些重叠群的原生生物进行分类分析表明,不同后生动物分类群中原生生物污染的丰度和均匀度存在差异,分别反映了顶复门、纤毛虫纲、卵菌纲和共生甲藻对哺乳动物和鸟类、甲壳纲、昆虫和刺胞动物门的宿主偏好。最后,从这些重叠群中预测出线粒体蛋白细胞色素氧化酶亚基1(COX1)和细胞色素b(CYTB),系统发育分析证实了受污染重叠群的原生生物起源和异质分布。总体而言,在本研究中,我们对基因组资源中的原生生物污染物进行了大规模扫描,检测到的原生生物序列将有助于揭示这些微微型真核生物的原生生物多样性及其与后生动物的关系。
Animals (Basel). 2023-10-11
Appl Microbiol Biotechnol. 2017-10-28
Front Bioeng Biotechnol. 2015-9-17
Biology (Basel). 2025-5-28
Animals (Basel). 2023-10-11
Mol Biol Rep. 2023-1
Front Microbiol. 2021-6-30