Suppr超能文献

ORTHOSKIM:用于系统发育基因组学和条形码应用的从基因组和转录组文库中进行的计算机辅助序列捕获。

ORTHOSKIM: In silico sequence capture from genomic and transcriptomic libraries for phylogenomic and barcoding applications.

机构信息

LECA, Laboratoire d'Ecologie Alpine (LECA), Univ. Grenoble Alpes, CNRS, Univ. Savoie Mont Blanc, Grenoble, France.

Systematics and Evolution of Vascular Plants (UAB) - Associated Unit to CSIC, Departament de Biologia Animal, Biologia Vegetal i Ecologia, Universitat Autònoma de Barcelona, Bellaterra, Spain.

出版信息

Mol Ecol Resour. 2022 Jul;22(5):2018-2037. doi: 10.1111/1755-0998.13584. Epub 2022 Jan 30.

Abstract

Low-coverage whole genome shotgun sequencing (or genome skimming) has emerged as a cost-effective method for acquiring genomic data in nonmodel organisms. This method provides sequence information on chloroplast genome (cpDNA), mitochondrial genome (mtDNA) and nuclear ribosomal regions (rDNA), which are over-represented within cells. However, numerous bioinformatic challenges remain to accurately and rapidly obtain such data in organisms with complex genomic structures and rearrangements, in particular for mtDNA in plants or for cpDNA in some plant families. Here we introduce the pipeline ORTHOSKIM, which performs in silico capture of targeted sequences from genomic and transcriptomic libraries without assembling whole organelle genomes. ORTHOSKIM proceeds in three steps: (i) global sequence assembly, (ii) mapping against reference sequences and (iii) target sequence extraction; importantly it also includes a range of quality control tests. Different modes are implemented to capture both coding and noncoding regions of cpDNA, mtDNA and rDNA sequences, along with predefined nuclear sequences (e.g., ultraconserved elements) or collections of single-copy orthologue genes. Moreover, aligned DNA matrices are produced for phylogenetic reconstructions, by performing multiple alignments of the captured sequences. While ORTHOSKIM is suitable for any eukaryote, a case study is presented here, using 114 genome-skimming libraries and four RNA sequencing libraries obtained for two plant families, Primulaceae and Ericaceae, the latter being a well-known problematic family for cpDNA assemblies. ORTHOSKIM recovered with high success rates cpDNA, mtDNA and rDNA sequences, well suited to accurately infer evolutionary relationships within these families. ORTHOSKIM is released under a GPL-3 licence and is available at: https://github.com/cpouchon/ORTHOSKIM.

摘要

低覆盖率全基因组鸟枪法测序(或基因组掠过)已成为获取非模式生物基因组数据的一种具有成本效益的方法。该方法提供了叶绿体基因组(cpDNA)、线粒体基因组(mtDNA)和核核糖体区(rDNA)的序列信息,这些信息在细胞中过量表达。然而,在基因组结构和重排复杂的生物体中,仍然存在许多生物信息学挑战,需要准确、快速地获得这些数据,特别是对于植物中的 mtDNA 或某些植物科中的 cpDNA。这里我们介绍 ORTHOSKIM 流程,该流程无需组装整个细胞器基因组,即可在基因组和转录组文库中进行靶向序列的计算机捕获。ORTHOSKIM 分三个步骤进行:(i)全局序列组装,(ii)与参考序列进行映射,(iii)提取目标序列;重要的是,它还包括一系列质量控制测试。不同的模式被用于捕获 cpDNA、mtDNA 和 rDNA 序列的编码和非编码区域,以及预定义的核序列(例如,超保守元件)或单拷贝同源基因的集合。此外,通过对捕获序列进行多次比对,生成对齐的 DNA 矩阵,以进行系统发育重建。虽然 ORTHOSKIM 适用于任何真核生物,但这里提出了一个案例研究,使用了 114 个基因组掠过文库和四个 RNA 测序文库,这些文库是为两个植物科 Primulaceae 和 Ericaceae 获得的,后者是 cpDNA 组装的一个众所周知的问题科。ORTHOSKIM 以高成功率回收了 cpDNA、mtDNA 和 rDNA 序列,非常适合准确推断这些科内的进化关系。ORTHOSKIM 在 GPL-3 许可证下发布,并可在以下网址获得:https://github.com/cpouchon/ORTHOSKIM。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验