Zhou Jiali, Zhang Xinrui, Wang Yujie, Liang Haoxian, Yang Yuhao, Huang Xiaolei, Deng Jun
State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
Animals (Basel). 2024 Nov 27;14(23):3432. doi: 10.3390/ani14233432.
The rapid advancement of high-throughput sequencing has led to a great increase in sequencing data, resulting in a significant accumulation of contamination, for example, sequences from non-target species may be present in the target species' sequencing data. Insecta, the most diverse group within Arthropoda, still lacks a comprehensive evaluation of contamination prevalence in public databases and an analysis of potential contamination causes. In this study, COI barcodes were used to investigate contamination from insects and mammals in GenBank's genomic and transcriptomic data across four insect orders. Among the 2796 WGS and 1382 TSA assemblies analyzed, contamination was detected in 32 (1.14%) WGS and 152 (11.0%) TSA assemblies. Key findings from this study include the following: (1) TSA data exhibited more severe contamination than WGS data; (2) contamination levels varied significantly among the four orders, with Hemiptera showing 9.22%, Coleoptera 3.48%, Hymenoptera 7.66%, and Diptera 1.89% contamination rates; (3) possible causes of contamination, such as food, parasitism, sample collection, and cross-contamination, were analyzed. Overall, this study proposes a workflow for checking the existence of contamination in WGS and TSA data and some suggestions to mitigate it.
高通量测序的快速发展导致测序数据大幅增加,进而造成污染显著累积,例如,目标物种的测序数据中可能存在来自非目标物种的序列。昆虫纲是节肢动物门中种类最多的类群,目前仍缺乏对公共数据库中污染流行情况的全面评估以及对潜在污染原因的分析。在本研究中,利用细胞色素氧化酶亚基I(COI)条形码调查了GenBank中四个昆虫目的基因组和转录组数据里来自昆虫和哺乳动物的污染情况。在所分析的2796个全基因组测序(WGS)和1382个转录本拼接数据集(TSA)中,在32个(1.14%)WGS和152个(11.0%)TSA数据集中检测到了污染。本研究的主要发现如下:(1)TSA数据的污染比WGS数据更严重;(2)四个目之间的污染水平差异显著,半翅目的污染率为9.22%,鞘翅目为3.48%,膜翅目为7.66%,双翅目为1.89%;(3)分析了污染的可能原因,如食物、寄生、样本采集和交叉污染等。总体而言,本研究提出了一种用于检查WGS和TSA数据中污染情况的工作流程以及一些减轻污染的建议。