Grupo Toxinología, Alternativas Terapéuticas y Alimentarias, Facultad de Ciencias Farmacéuticas y Alimentarias, Universidad de Antioquia, Medellín 050012, Colombia.
Corporación para Investigaciones Biológicas, Medellín 050012, Colombia.
Toxins (Basel). 2022 Jun 15;14(6):408. doi: 10.3390/toxins14060408.
Spider venoms constitute a trove of novel peptides with biotechnological interest. Paucity of next-generation-sequencing (NGS) data generation has led to a description of less than 1% of these peptides. Increasing evidence supports the underestimation of the assembled genes a single transcriptome assembler can predict. Here, the transcriptome of the venom gland of the spider was re-assembled, using three free access algorithms, Trinity, SOAPdenovo-Trans, and SPAdes, to obtain a more complete annotation. Assembler's performance was evaluated by contig number, N50, read representation on the assembly, and BUSCO's terms retrieval against the arthropod dataset. Out of all the assembled sequences with all software, 39.26% were common between the three assemblers, and 27.88% were uniquely assembled by Trinity, while 27.65% were uniquely assembled by SPAdes. The non-redundant merging of all three assemblies' output permitted the annotation of 9232 sequences, which was 23% more when compared to each software and 28% more when compared to the previous annotation; moreover, the description of 65 novel theraphotoxins was possible. In the generation of data for non-model organisms, as well as in the search for novel peptides with biotechnological interest, it is highly recommended to employ at least two different transcriptome assemblers.
蜘蛛毒液是具有生物技术应用价值的新型肽类物质的宝库。由于下一代测序(NGS)数据生成不足,导致这些肽类物质的描述不到 1%。越来越多的证据表明,单个转录组组装器预测的组装基因数量被低估了。在这里,我们重新组装了蜘蛛毒液腺的转录组,使用三种免费的算法,Trinity、SOAPdenovo-Trans 和 SPAdes,以获得更完整的注释。通过对组装体的contig 数量、N50、reads 表达量和 BUSCO 的检索结果评估组装器的性能,该结果是针对节肢动物数据集进行的。在所有软件组装的所有序列中,有 39.26%的序列在三个组装器之间是共同的,有 27.88%的序列是由 Trinity 组装的,而有 27.65%的序列是由 SPAdes 组装的。将这三个组装器的输出结果进行非冗余合并,可以注释 9232 个序列,与每个软件相比增加了 23%,与之前的注释相比增加了 28%;此外,还可以描述 65 种新型 theraphotoxins。在非模式生物的数据集生成以及寻找具有生物技术应用价值的新型肽类物质时,强烈建议使用至少两种不同的转录组组装器。