Suppr超能文献

评估靶向序列捕获、RNA-Seq 和简并引物 PCR 克隆在测序最大的哺乳动物多基因家族中的性能。

Evaluating the performance of targeted sequence capture, RNA-Seq, and degenerate-primer PCR cloning for sequencing the largest mammalian multigene family.

机构信息

Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA.

Department of Geology and Geophysics, Yale University, Stony Brook, NY, USA.

出版信息

Mol Ecol Resour. 2020 Jan;20(1):140-153. doi: 10.1111/1755-0998.13093. Epub 2019 Oct 3.

Abstract

Multigene families evolve from single-copy ancestral genes via duplication, and typically encode proteins critical to key biological processes. Molecular analyses of these gene families require high-confidence sequences, but the high sequence similarity of the members can create challenges for sequencing and downstream analyses. Focusing on the common vampire bat, Desmodus rotundus, we evaluated how different sequencing approaches performed in recovering the largest mammalian protein-coding multigene family: olfactory receptors (OR). Using the genome as a reference, we determined the proportion of intact protein-coding receptors recovered by: (a) amplicons from degenerate primers sequenced via Sanger technology, (b) RNA-Seq of the main olfactory epithelium, and (c) those genes captured with probes designed from transcriptomes of closely-related species. Our initial re-annotation of the high-quality vampire bat genome resulted in >400 intact OR genes, more than doubling the original estimate. Sanger-sequenced amplicons performed the poorest among the three approaches, detecting <33% of receptors in the genome. In contrast, the transcriptome reliably recovered >50% of the annotated genomic ORs, and targeted sequence capture recovered nearly 75% of annotated genes. Each sequencing approach assembled high-quality sequences, even if it did not recover all receptors in the genome. While some variation may be due to limitations of the study design (e.g., different individuals), variation among approaches was mostly caused by low coverage of some receptors rather than high rates of assembly error. Given this variability, we caution against using the counts of intact receptors per species to model the birth-death process of multigene families. Instead, our results support the use of orthologous sequences to explore and model the evolutionary processes shaping these genes.

摘要

多基因家族通过复制从单拷贝祖先基因进化而来,通常编码对关键生物过程至关重要的蛋白质。这些基因家族的分子分析需要高可信度的序列,但成员之间的高度序列相似性可能会给测序和下游分析带来挑战。以普通吸血蝙蝠(Desmodus rotundus)为研究对象,我们评估了不同测序方法在回收最大的哺乳动物蛋白编码多基因家族(嗅觉受体(OR))方面的表现。利用基因组作为参考,我们确定了通过以下三种方法回收完整蛋白编码受体的比例:(a)通过 Sanger 技术测序的简并引物扩增子,(b)主要嗅觉上皮的 RNA-Seq,以及(c)用来自密切相关物种转录组设计的探针捕获的基因。我们对高质量吸血蝙蝠基因组的初步重新注释产生了 >400 个完整的 OR 基因,比最初的估计增加了一倍多。三种方法中,Sanger 测序的扩增子表现最差,仅检测到基因组中 <33%的受体。相比之下,转录组可靠地回收了 >50%的注释基因组 OR,而靶向序列捕获几乎回收了近 75%的注释基因。每种测序方法都能组装出高质量的序列,即使它没有回收基因组中的所有受体。虽然一些差异可能是由于研究设计的限制(例如,不同个体),但方法之间的差异主要是由于一些受体的覆盖度较低,而不是组装错误率较高所致。鉴于这种可变性,我们警告不要使用每个物种的完整受体计数来模拟多基因家族的诞生-死亡过程。相反,我们的结果支持使用直系同源序列来探索和模拟塑造这些基因的进化过程。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验