• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

昆虫基因组和转录组数据的污染调查

Contamination Survey of Insect Genomic and Transcriptomic Data.

作者信息

Zhou Jiali, Zhang Xinrui, Wang Yujie, Liang Haoxian, Yang Yuhao, Huang Xiaolei, Deng Jun

机构信息

State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China.

出版信息

Animals (Basel). 2024 Nov 27;14(23):3432. doi: 10.3390/ani14233432.

DOI:10.3390/ani14233432
PMID:39682398
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11639764/
Abstract

The rapid advancement of high-throughput sequencing has led to a great increase in sequencing data, resulting in a significant accumulation of contamination, for example, sequences from non-target species may be present in the target species' sequencing data. Insecta, the most diverse group within Arthropoda, still lacks a comprehensive evaluation of contamination prevalence in public databases and an analysis of potential contamination causes. In this study, COI barcodes were used to investigate contamination from insects and mammals in GenBank's genomic and transcriptomic data across four insect orders. Among the 2796 WGS and 1382 TSA assemblies analyzed, contamination was detected in 32 (1.14%) WGS and 152 (11.0%) TSA assemblies. Key findings from this study include the following: (1) TSA data exhibited more severe contamination than WGS data; (2) contamination levels varied significantly among the four orders, with Hemiptera showing 9.22%, Coleoptera 3.48%, Hymenoptera 7.66%, and Diptera 1.89% contamination rates; (3) possible causes of contamination, such as food, parasitism, sample collection, and cross-contamination, were analyzed. Overall, this study proposes a workflow for checking the existence of contamination in WGS and TSA data and some suggestions to mitigate it.

摘要

高通量测序的快速发展导致测序数据大幅增加,进而造成污染显著累积,例如,目标物种的测序数据中可能存在来自非目标物种的序列。昆虫纲是节肢动物门中种类最多的类群,目前仍缺乏对公共数据库中污染流行情况的全面评估以及对潜在污染原因的分析。在本研究中,利用细胞色素氧化酶亚基I(COI)条形码调查了GenBank中四个昆虫目的基因组和转录组数据里来自昆虫和哺乳动物的污染情况。在所分析的2796个全基因组测序(WGS)和1382个转录本拼接数据集(TSA)中,在32个(1.14%)WGS和152个(11.0%)TSA数据集中检测到了污染。本研究的主要发现如下:(1)TSA数据的污染比WGS数据更严重;(2)四个目之间的污染水平差异显著,半翅目的污染率为9.22%,鞘翅目为3.48%,膜翅目为7.66%,双翅目为1.89%;(3)分析了污染的可能原因,如食物、寄生、样本采集和交叉污染等。总体而言,本研究提出了一种用于检查WGS和TSA数据中污染情况的工作流程以及一些减轻污染的建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/ef91f291845c/animals-14-03432-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/999f9ea67d14/animals-14-03432-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/8f79ea86a611/animals-14-03432-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/4b6c399b6e57/animals-14-03432-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/ada3a2a90992/animals-14-03432-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/ef91f291845c/animals-14-03432-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/999f9ea67d14/animals-14-03432-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/8f79ea86a611/animals-14-03432-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/4b6c399b6e57/animals-14-03432-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/ada3a2a90992/animals-14-03432-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016f/11639764/ef91f291845c/animals-14-03432-g005.jpg

相似文献

1
Contamination Survey of Insect Genomic and Transcriptomic Data.昆虫基因组和转录组数据的污染调查
Animals (Basel). 2024 Nov 27;14(23):3432. doi: 10.3390/ani14233432.
2
Comparative performances of DNA barcoding across insect orders.昆虫目 DNA 条形码的比较性能。
BMC Bioinformatics. 2010 Apr 27;11:206. doi: 10.1186/1471-2105-11-206.
3
A Large-Scale Study into Protist-Animal Interactions Based on Public Genomic Data Using DNA Barcodes.一项基于公共基因组数据利用DNA条形码对原生生物与动物相互作用的大规模研究。
Animals (Basel). 2023 Jul 8;13(14):2243. doi: 10.3390/ani13142243.
4
Diversity and Distribution of Mites (ACARI) Revealed by Contamination Survey in Public Genomic Databases.公共基因组数据库污染调查揭示的螨类(蜱螨亚纲)多样性与分布
Animals (Basel). 2023 Oct 11;13(20):3172. doi: 10.3390/ani13203172.
5
A new cost-effective and fast direct PCR protocol for insects based on PBS buffer.一种基于 PBS 缓冲液的新型经济高效、快速的昆虫直接 PCR 方案。
Mol Ecol Resour. 2019 May;19(3):691-701. doi: 10.1111/1755-0998.13005.
6
Herbivory increases diversification across insect clades.食草行为促进了昆虫各分支的多样化。
Nat Commun. 2015 Sep 24;6:8370. doi: 10.1038/ncomms9370.
7
High-quality genome assemblies for nine non-model North American insect species representing six orders (Insecta: Coleoptera, Diptera, Hemiptera, Hymenoptera, Lepidoptera, Neuroptera).九个代表六个目(昆虫纲:鞘翅目、双翅目、半翅目、膜翅目、鳞翅目、脉翅目)的非模式北美昆虫物种的高质量基因组组装。
Mol Ecol Resour. 2024 Nov;24(8):e14010. doi: 10.1111/1755-0998.14010. Epub 2024 Aug 18.
8
Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia.利用来自哥伦比亚的超过一千个昆虫 DNA 条形码对 BOLD 和 GenBank 数据库进行分类鉴定准确性研究。
PLoS One. 2023 Apr 24;18(4):e0277379. doi: 10.1371/journal.pone.0277379. eCollection 2023.
9
Long Reads Are Revolutionizing 20 Years of Insect Genome Sequencing.长文正在彻底改变 20 年的昆虫基因组测序。
Genome Biol Evol. 2021 Aug 3;13(8). doi: 10.1093/gbe/evab138.
10
A DNA barcode survey of insect biodiversity in Pakistan.巴基斯坦昆虫生物多样性的 DNA 条形码调查。
PeerJ. 2022 Apr 25;10:e13267. doi: 10.7717/peerj.13267. eCollection 2022.

本文引用的文献

1
Rapid and sensitive detection of genome contamination at scale with FCS-GX.使用 FCS-GX 实现大规模的基因组污染快速灵敏检测。
Genome Biol. 2024 Feb 26;25(1):60. doi: 10.1186/s13059-024-03198-7.
2
ContScout: sensitive detection and removal of contamination from annotated genomes.ContScout:注释基因组中污染的敏感检测和去除。
Nat Commun. 2024 Jan 31;15(1):936. doi: 10.1038/s41467-024-45024-5.
3
The Principles and Applications of High-Throughput Sequencing Technologies.高通量测序技术的原理与应用
Dev Reprod. 2023 Apr;27(1):9-24. doi: 10.12717/DR.2023.27.1.9. Epub 2023 Mar 31.
4
Diversity and Distribution of Mites (ACARI) Revealed by Contamination Survey in Public Genomic Databases.公共基因组数据库污染调查揭示的螨类(蜱螨亚纲)多样性与分布
Animals (Basel). 2023 Oct 11;13(20):3172. doi: 10.3390/ani13203172.
5
Prey ration, temperature, and predator species influence digestion rates of prey DNA inferred from qPCR and metabarcoding.猎物比例、温度和捕食者种类会影响通过定量聚合酶链反应(qPCR)和宏条形码技术推断出的猎物DNA消化率。
Mol Ecol Resour. 2025 Jul;25(5):e13849. doi: 10.1111/1755-0998.13849. Epub 2023 Aug 9.
6
Next-Generation Sequencing Technology: Current Trends and Advancements.下一代测序技术:当前趋势与进展
Biology (Basel). 2023 Jul 13;12(7):997. doi: 10.3390/biology12070997.
7
DNA barcoding, an effective tool for species identification: a review.DNA条形码:一种用于物种鉴定的有效工具——综述
Mol Biol Rep. 2023 Jan;50(1):761-775. doi: 10.1007/s11033-022-08015-7. Epub 2022 Oct 29.
8
Elucidation of host and symbiont contributions to peptidoglycan metabolism based on comparative genomics of eight aphid subfamilies and their Buchnera.基于 8 个蚜虫亚科及其共生菌 Buchnera 的比较基因组学阐明宿主和共生体对肽聚糖代谢的贡献。
PLoS Genet. 2022 May 6;18(5):e1010195. doi: 10.1371/journal.pgen.1010195. eCollection 2022 May.
9
Transcriptional Regulation of Reproductive Diapause in the Convergent Lady Beetle, .聚合性瓢虫生殖滞育的转录调控
Insects. 2022 Mar 31;13(4):343. doi: 10.3390/insects13040343.
10
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.