Institut Pasteur, Université Paris Cité, CNRS UMR3569, Viruses and RNA Interference Unit, F-75015, Paris, France.
Institut Pasteur de Bangui, Medical Entomology Laboratory, Bangui, Central African Republic.
Elife. 2023 Jan 23;12:e82762. doi: 10.7554/eLife.82762.
Total RNA sequencing (RNA-seq) is an important tool in the study of mosquitoes and the RNA viruses they vector as it allows assessment of both host and viral RNA in specimens. However, there are two main constraints. First, as with many other species, abundant mosquito ribosomal RNA (rRNA) serves as the predominant template from which sequences are generated, meaning that the desired host and viral templates are sequenced far less. Second, mosquito specimens captured in the field must be correctly identified, in some cases to the sub-species level. Here, we generate mosquito rRNA datasets which will substantially mitigate both of these problems. We describe a strategy to assemble novel rRNA sequences from mosquito specimens and produce an unprecedented dataset of 234 full-length 28S and 18S rRNA sequences of 33 medically important species from countries with known histories of mosquito-borne virus circulation (Cambodia, the Central African Republic, Madagascar, and French Guiana). These sequences will allow both physical and computational removal of rRNA from specimens during RNA-seq protocols. We also assess the utility of rRNA sequences for molecular taxonomy and compare phylogenies constructed using rRNA sequences versus those created using the gold standard for molecular species identification of specimens-the mitochondrial c (COI) gene. We find that rRNA- and COI-derived phylogenetic trees are incongruent and that 28S and concatenated 28S+18S rRNA phylogenies reflect evolutionary relationships that are more aligned with contemporary mosquito systematics. This significant expansion to the current rRNA reference library for mosquitoes will improve mosquito RNA-seq metagenomics by permitting the optimization of species-specific rRNA depletion protocols for a broader range of species and streamlining species identification by rRNA sequence and phylogenetics.
总 RNA 测序 (RNA-seq) 是研究蚊子及其携带的 RNA 病毒的重要工具,因为它可以评估标本中的宿主和病毒 RNA。然而,有两个主要的限制。首先,与许多其他物种一样,丰富的蚊子核糖体 RNA (rRNA) 是生成序列的主要模板,这意味着所需的宿主和病毒模板的测序要少得多。其次,必须正确识别野外捕获的蚊子标本,在某些情况下需要鉴定到亚种水平。在这里,我们生成了蚊子 rRNA 数据集,这将大大减轻这两个问题。我们描述了一种从蚊子标本中组装新 rRNA 序列的策略,并产生了一个前所未有的数据集,其中包括来自已知有蚊媒病毒传播史的国家(柬埔寨、中非共和国、马达加斯加和法属圭亚那)的 33 种医学上重要的物种的全长 28S 和 18S rRNA 序列 234 个。这些序列将允许在 RNA-seq 方案中从标本中物理和计算去除 rRNA。我们还评估了 rRNA 序列在分子分类学中的实用性,并比较了使用 rRNA 序列构建的系统发育树与用于鉴定标本分子物种的金标准——线粒体 c (COI)基因构建的系统发育树。我们发现 rRNA 和 COI 衍生的系统发育树不一致,28S 和串联 28S+18S rRNA 系统发育树反映了与当代蚊子系统发育更一致的进化关系。这对当前蚊子 rRNA 参考文库的重大扩展将通过允许针对更广泛的物种优化特定物种的 rRNA 耗竭方案,并通过 rRNA 序列和系统发育学简化物种鉴定,从而改善蚊子 RNA-seq 宏基因组学。