Suppr超能文献

对链霉菌基因组中的DNA序列进行净化以实现最佳基因组挖掘。

Decontamination of DNA sequences from a Streptomyces genome for optimal genome mining.

作者信息

de Oliveira Raul Vitor Ferreira, Garrido Leandro Maza, Padilla Gabriel

机构信息

Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo (USP), São Paulo, SP, 05508-900, Brazil.

出版信息

Braz J Microbiol. 2025 Mar;56(1):79-89. doi: 10.1007/s42770-024-01598-2. Epub 2025 Jan 15.

Abstract

Despite meticulous precautions, contamination of genomic DNA samples is not uncommon, which can significantly compromise the analysis of microorganisms' whole-genome sequencing data, thus affecting all subsequent analyses. Thanks to advancements in software and bioinformatics techniques, it is now possible to address this issue and prevent the loss of the entire dataset obtained in a contaminated whole-genome sequencing, where the DNA of another bacterium is present. In this study, it was observed that the sequencing reads from Streptomyces sp. BRB040, generated using the HiSeq System platform (Illumina Inc., San Diego, USA), were contaminated with the DNA of Bacillus licheniformis. To eliminate the contamination in Streptomyces sp. BRB040, a combination of tools available on the Galaxy platform and other web-based resources were used (MeDuSa and Blast). The contaminated reads were treated as a metagenome to isolate the genome of the contaminating organism. They were assembled using the metaSPAdes, resulting in a large scaffold of 4.187 Mb, which was identified as Bacillus licheniformis. After the identification of the contaminating organism, its genome was used as a filter to remove sequencing reads that could align using then Bowtie 2 software for this step. Once the contaminated reads were removed a new assembly was performed using the Unicycler software, yielding 117 contigs with a total size of 7.9 Mb. The completeness of this genome was assessed through BUSCO, resulting in a completeness of 95.9%. We also used an alternative tool (BBduk) to eliminate contaminated reads and the resulting assembly by Unicycler generated 85 contigs with a total size of 8.3 Mb and completeness of 99.5%. These results were better than the assembly obtained via SPAdes, which generated less complete genomes (maximum of 97.8% completeness) compared to Unicycler and which was unable to perform an adequate assembly of the data obtained from decontamination by BBduk. When compared with the uncontaminated BRB040 genome, which has a total size of 8.2 Mb and completeness of 99.8%, this pipeline revealed that the assembly performed with the decontaminated reads via BBduk presented better results, with completeness 0.3% lower than the reference. The genome mining of both genomes using antiSMASH 7.0 revealed the number of 24 Biosynthetic Gene Clusters (BGCs) for BBduk data as well as in the control assembly of the BRB040. In silico decontamination process allows the genome mining of BGCs despite the loss of nucleotides. These findings show that contamination can be effectively removed from a genome using readily available online tools, while preserving a dataset suitable for extracting valuable insights into the secondary metabolism of the target organism. This approach is particularly beneficial in scenarios where resequencing samples is not immediately feasible.

摘要

尽管采取了细致的预防措施,但基因组DNA样本的污染并不罕见,这可能会严重影响微生物全基因组测序数据的分析,进而影响所有后续分析。得益于软件和生物信息学技术的进步,现在有能力解决这个问题,并防止在存在另一种细菌DNA的污染全基因组测序中丢失整个数据集。在本研究中,观察到使用HiSeq系统平台(美国圣地亚哥的Illumina公司)生成的链霉菌属BRB040的测序读数被地衣芽孢杆菌的DNA污染。为了消除链霉菌属BRB040中的污染,使用了Galaxy平台上可用的工具和其他基于网络的资源(MeDuSa和Blast)的组合。将受污染的读数视为宏基因组以分离污染生物体的基因组。使用metaSPAdes对它们进行组装,得到一个4.187 Mb的大支架,被鉴定为地衣芽孢杆菌。在鉴定出污染生物体后,将其基因组用作过滤器,以去除使用Bowtie 2软件在此步骤中可以比对的测序读数。一旦去除受污染的读数,就使用Unicycler软件进行新的组装,产生117个重叠群,总大小为7.9 Mb。通过BUSCO评估该基因组的完整性,完整性为95.9%。我们还使用了另一种工具(BBduk)来消除受污染的读数,Unicycler生成的组装结果产生了85个重叠群,总大小为8.3 Mb,完整性为99.5%。这些结果优于通过SPAdes获得的组装结果,与Unicycler相比,SPAdes生成的基因组完整性较低(最高为97.8%),并且无法对通过BBduk去污染获得的数据进行充分组装。与未受污染的BRB040基因组(总大小为8.2 Mb,完整性为99.8%)相比,该流程表明,通过BBduk对去污染读数进行的组装呈现出更好的结果,完整性比参考基因组低0.3%。使用antiSMASH 7.0对两个基因组进行基因组挖掘,发现BBduk数据以及BRB040的对照组装中有24个生物合成基因簇(BGC)。计算机去污染过程允许在核苷酸丢失的情况下对BGC进行基因组挖掘。这些发现表明,使用现成的在线工具可以有效地从基因组中去除污染,同时保留适合提取目标生物体次级代谢有价值见解的数据集。这种方法在重新测序样本不可行的情况下特别有益。

相似文献

1
Decontamination of DNA sequences from a Streptomyces genome for optimal genome mining.
Braz J Microbiol. 2025 Mar;56(1):79-89. doi: 10.1007/s42770-024-01598-2. Epub 2025 Jan 15.
2
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
7
Home treatment for mental health problems: a systematic review.
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
9
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
10
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

本文引用的文献

1
The promise and pitfalls of synteny in phylogenomics.
PLoS Biol. 2024 May 20;22(5):e3002632. doi: 10.1371/journal.pbio.3002632. eCollection 2024 May.
4
Contamination detection in genomic data: more is not enough.
Genome Biol. 2022 Feb 21;23(1):60. doi: 10.1186/s13059-022-02619-9.
5
Draft Genome Sequence of a Poly-γ-Glutamic Acid-Producing Isolate, Bacillus paralicheniformis Strain bcasdu2018/01.
Microbiol Resour Announc. 2021 Nov 18;10(46):e0101321. doi: 10.1128/MRA.01013-21.
6
WGA-LP: a pipeline for whole genome assembly of contaminated reads.
Bioinformatics. 2022 Jan 12;38(3):846-848. doi: 10.1093/bioinformatics/btab719.
7
Genome mining for drug discovery: progress at the front end.
J Ind Microbiol Biotechnol. 2021 Dec 23;48(9-10). doi: 10.1093/jimb/kuab044.
8
Using SPAdes De Novo Assembler.
Curr Protoc Bioinformatics. 2020 Jun;70(1):e102. doi: 10.1002/cpbi.102.
10
Marine Bacteria from Rocas Atoll as a Rich Source of Pharmacologically Active Compounds.
Mar Drugs. 2019 Nov 28;17(12):671. doi: 10.3390/md17120671.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验