Havird Justin C, Santos Scott R
*Department of Biology, Colorado State University, Fort Collins, CO 80523, USA;
Department of Biological Sciences and Molette Laboratory for Climate Change and Environmental Studies, Auburn University, 101 Rouse Life Sciences Bldg, Auburn, AL 36849, USA.
Integr Comp Biol. 2016 Dec;56(6):1055-1066. doi: 10.1093/icb/icw061. Epub 2016 Jul 8.
Despite their economic, ecological, and experimental importance, genomic resources remain scarce for crustaceans. In lieu of genomes, many researchers have taken advantage of technological advancements to instead sequence and assemble crustacean transcriptomes de novo However, there is little consensus on what standard operating procedures are, or should be, for the field. Here, we systematically reviewed 53 studies published during 2014-2015 that utilized transcriptomic resources from this taxonomic group in an effort to identify commonalities as well as potential weaknesses that have applicability beyond just crustaceans. In general, these studies utilized RNA-Seq data, both novel and publicly available, to characterize transcriptomes and/or identify differentially expressed genes (DEGs) between treatments. Although the software suite Trinity was popular in assembly pipelines and other programs were also commonly employed, many studies failed to report crucial details regarding bioinformatic methodologies, including read mappers and the utilized parameters in identifying and characterizing DEGs. Annotation percentages for assembled transcriptomic contigs were low, averaging 32% overall. While other metrics, such as numbers of contigs and DEGs reported, correlated with the number of sequence reads utilized per sample, these did reach apparent saturation with increasing sequencing depth. Most disturbingly, a number of studies (55%) reported DEGs based on non-replicated experimental designs and single biological replicates for each treatment. Given this, we suggest future RNA-Seq experiments targeting transcriptome characterization conduct deeper (i.e., 50-100 M reads) sequencing while those examining differential expression instead focus more on increased biological replicates at shallower (i.e., ∼10-20 M reads/sample) sequencing depths. Moreover, the community must avoid submitting for review, or accepting for publication, non-replicated differential expression studies. Finally, mining the ever growing publicly available transcriptomic data from crustaceans will allow future studies to focus on hypothesis-driven research instead of continuing to simply characterize transcriptomes. As an example of this, we utilized neurotoxin sequences from the recently described remipede venom gland transcriptome in conjunction with publicly available crustacean transcriptomic data to derive preliminary results and hypotheses regarding the evolution of venom in crustaceans.
尽管甲壳类动物在经济、生态和实验方面具有重要意义,但其基因组资源仍然稀缺。由于缺乏基因组信息,许多研究人员利用技术进步对甲壳类动物的转录组进行从头测序和组装。然而,对于该领域的标准操作程序是什么,或者应该是什么,几乎没有达成共识。在这里,我们系统地回顾了2014年至2015年发表的53项研究,这些研究利用了该分类群的转录组资源,以确定共性以及潜在的弱点,这些弱点不仅适用于甲壳类动物。总体而言,这些研究利用了新的和公开可用的RNA-Seq数据来表征转录组和/或识别处理之间的差异表达基因(DEG)。虽然软件套件Trinity在组装流程中很受欢迎,其他程序也经常被使用,但许多研究未能报告关于生物信息学方法的关键细节,包括读段映射器以及在识别和表征DEG时使用的参数。组装的转录组重叠群的注释百分比很低,总体平均为32%。虽然其他指标,如报告的重叠群和DEG数量,与每个样本使用的序列读数数量相关,但随着测序深度的增加,这些指标确实达到了明显的饱和。最令人不安的是,一些研究(55%)基于非重复的实验设计和每个处理的单个生物学重复报告了DEG。鉴于此,我们建议未来针对转录组表征的RNA-Seq实验进行更深层次(即50-100M读段)的测序,而那些研究差异表达的实验则更多地关注在较浅(即~10-20M读段/样本)的测序深度增加生物学重复。此外,该领域必须避免提交或接受非重复差异表达研究以供审查或发表。最后,挖掘不断增长的公开可用的甲壳类动物转录组数据将使未来的研究能够专注于假设驱动的研究,而不是继续简单地表征转录组。作为一个例子,我们利用最近描述的盲虾毒液腺转录组中的神经毒素序列以及公开可用的甲壳类动物转录组数据,得出了关于甲壳类动物毒液进化的初步结果和假设。