Suppr超能文献

九个代表六个目(昆虫纲:鞘翅目、双翅目、半翅目、膜翅目、鳞翅目、脉翅目)的非模式北美昆虫物种的高质量基因组组装。

High-quality genome assemblies for nine non-model North American insect species representing six orders (Insecta: Coleoptera, Diptera, Hemiptera, Hymenoptera, Lepidoptera, Neuroptera).

机构信息

Roy J. Carver Biotechnology Center, University of Illinois Urbana-Champaign, Urbana, Illinois, USA.

Key Laboratory of Plant Protection Resources and Pest Management of the Ministry of Education, Entomological Museum, Northwest A&F University, Yangling, Shaanxi, China.

出版信息

Mol Ecol Resour. 2024 Nov;24(8):e14010. doi: 10.1111/1755-0998.14010. Epub 2024 Aug 18.

Abstract

Field-collected specimens were used to obtain nine high-quality genome assemblies from a total of 10 insect species native to prairies and savannas of central Illinois (USA): Mellilla xanthometata (Lepidoptera: Geometridae), Stenolophus ochropezus (Coleoptera: Carabidae), Forcipata loca (Hemiptera: Cicadellidae), Coelinius sp. (Hymenoptera: Braconidae), Thaumatomyia glabra (Diptera: Chloropidae), Brachynemurus abdominalus (Neuroptera: Myrmeleontidae), Catonia carolina (Hemiptera: Achilidae), Oncometopia orbona (Hemiptera: Cicadellidae), Flexamia atlantica (Hemiptera: Cicadellidae) and Stictocephala bisonia (Hemiptera: Membracidae). Sequencing library preparation from single specimens was successful despite extremely small DNA yields (<0.1 μg) for some samples. Additional sequencing and assembly workflows were adapted to each sample depending on the initial DNA yield. PacBio circular consensus (CCS/HiFi) or continuous long reads (CLR) libraries were used to sequence DNA fragments up to 50 kb in length, with Illumina sequenced linked-reads (TellSeq libraries) and Omni-C libraries used for scaffolding and gap-filling. Assembled genome sizes ranged from 135 MB to 3.2 GB. The number of assembled scaffolds ranged from 47 to >13,000, with the longest scaffold per assembly ranging from ~23 to 439 Mb. Genome completeness was high, with BUSCO scores ranging from 85.5% completeness for the largest genome (Stictocephala bisonia) to 98.8% completeness for the smallest genome (Coelinius sp.). The unique content was estimated using RepeatMasker and GenomeScope2, which ranged from 50.7% to 75.8% and roughly decreased with increasing genome size. Structural annotation predicted a range of 19,281-72,469 protein models for sequenced species. Sequencing costs per genome at the time ranged from US$3-5k, averaged ~1600 CPU-hours on a high-performance cluster and required approximately 14 h of bioinformatics analyses with samples using PacBio HiFi data. Most assemblies would benefit from further manual curation to correct possible scaffold misjoins and translocations suggested by off-diagonal or depleted signals in Omni-C contact maps.

摘要

从原产于美国伊利诺伊州中部草原和热带稀树草原的 10 种昆虫物种中采集了 9 个高质量的基因组组装:Mellilla xanthometata(鳞翅目:尺蛾科)、stenolophus ochropezus(鞘翅目:步甲科)、Forcipata loca(半翅目:叶蝉科)、Coelinius sp.(膜翅目:Braconidae)、Thaumatomyia glabra(双翅目:绿蝇科)、Brachynemurus abdominalus(Neuroptera:Myrmeleontidae)、Catonia carolina(半翅目:Achilidae)、Oncometopia orbona(半翅目:叶蝉科)、Flexamia atlantica(半翅目:叶蝉科)和 Stictocephala bisonia(半翅目:Membracidae)。尽管一些样本的 DNA 产量极低(<0.1μg),但仍成功地从单个样本中制备了测序文库。根据初始 DNA 产量,对每个样本都进行了额外的测序和组装工作流程。Pacific Biosciences 圆形一致(CCS/HiFi)或连续长读(CLR)文库用于测序长达 50kb 的 DNA 片段,Illumina 测序连接读取(TellSeq 文库)和 Omni-C 文库用于支架和缺口填充。组装的基因组大小范围从 135MB 到 3.2GB。组装的支架数量从 47 到>13000 个,每个组装的最长支架长度从23 到 439Mb 不等。基因组的完整性很高,BUSCO 评分范围从最大基因组(Stictocephala bisonia)的 85.5%完整到最小基因组(Coelinius sp.)的 98.8%完整。使用 RepeatMasker 和 GenomeScope2 估计的独特内容范围为 50.7%至 75.8%,并且大致随着基因组大小的增加而减少。结构注释预测了 19281-72469 个蛋白质模型,用于测序物种。当时,每个基因组的测序成本为 3-5k 美元,在高性能集群上平均需要1600 个 CPU 小时,并且使用 PacBio HiFi 数据对样本进行约 14 小时的生物信息学分析。大多数组装都将受益于进一步的手动整理,以纠正 Omni-C 接触图中对角线或耗尽信号所提示的可能的支架连接错误和易位。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验