White Oliver W, Hall Andie, Price Ben W, Williams Suzanne T, Clark Matthew D
The Natural History Museum, London, UK.
Mol Ecol Resour. 2025 Jan;25(1):e14036. doi: 10.1111/1755-0998.14036. Epub 2024 Oct 28.
Low coverage 'genome-skims' are often used to assemble organelle genomes and ribosomal gene sequences for cost-effective phylogenetic and barcoding studies. Natural history collections hold invaluable biological information, yet poor preservation resulting in degraded DNA often hinders polymerase chain reaction-based analyses. However, it is possible to generate libraries and sequence the short fragments typical of degraded DNA to generate genome-skims from museum collections. Here we introduce a snakemake toolkit comprised of three pipelines skim2mito, skim2rrna and gene2phylo, designed to unlock the genomic potential of historical museum specimens using genome skimming. Specifically, skim2mito and skim2rrna perform the batch assembly, annotation and phylogenetic analysis of mitochondrial genomes and nuclear ribosomal genes, respectively, from low-coverage genome skims. The third pipeline gene2phylo takes a set of gene alignments and performs phylogenetic analysis of individual genes, partitioned analysis of concatenated alignments and a phylogenetic analysis based on gene trees. We benchmark our pipelines with simulated data, followed by testing with a novel genome skimming dataset from both recent and historical solariellid gastropod samples. We show that the toolkit can recover mitochondrial and ribosomal genes from poorly preserved museum specimens of the gastropod family Solariellidae, and the phylogenetic analysis is consistent with our current understanding of taxonomic relationships. The generation of bioinformatic pipelines that facilitate processing large quantities of sequence data from the vast repository of specimens held in natural history museum collections will greatly aid species discovery and exploration of biodiversity over time, ultimately aiding conservation efforts in the face of a changing planet.
低覆盖度的“基因组扫描”通常用于组装细胞器基因组和核糖体基因序列,以进行经济高效的系统发育和条形码研究。自然历史标本馆保存着极为珍贵的生物信息,但保存不佳导致DNA降解,常常阻碍基于聚合酶链式反应的分析。然而,通过构建文库并对降解DNA典型的短片段进行测序,从而从博物馆标本中生成基因组扫描数据是可行的。在此,我们介绍一个由skim2mito、skim2rrna和gene2phylo三个流程组成的Snakemake工具包,旨在利用基因组扫描发掘历史博物馆标本的基因组潜力。具体而言,skim2mito和skim2rrna分别对低覆盖度基因组扫描数据进行线粒体基因组和核糖体基因的批量组装、注释及系统发育分析。第三个流程gene2phylo则采用一组基因比对数据,对单个基因进行系统发育分析、对串联比对数据进行分区分析以及基于基因树进行系统发育分析。我们先用模拟数据对我们的流程进行基准测试,随后用来自近期和历史太阳螺科腹足类样本的全新基因组扫描数据集进行测试。我们表明,该工具包能够从保存不佳的太阳螺科博物馆标本中恢复线粒体和核糖体基因,且系统发育分析结果与我们目前对分类关系的理解一致。生成有助于处理自然历史标本馆收藏的大量标本库中序列数据的生物信息学流程,将极大地助力物种发现以及长期的生物多样性探索,最终在面对不断变化的地球时有助于保护工作。