School of Oceanography, University of Washington, Seattle, WA, 98195, USA.
Department of Biology, Genetics Institute, University of Florida, Gainesville, FL, 32610, USA.
Sci Data. 2024 Oct 22;11(1):1161. doi: 10.1038/s41597-024-04005-5.
Marine microbial eukaryotes (protists) perform essential metabolic functions in oceanic ecosystems. The diversity of protist functions remains poorly understood as few species have been isolated in laboratory settings. Metatranscriptomes provide an invaluable tool for exploring protist diversity and genetic capacities within their natural habitats. Here, we introduce the North Pacific Eukaryotic Gene Catalog, a compilation of metatranscriptome data derived from a total of 261 metatranscriptomes: 169 metatranscriptomes were derived from samples collected on three meridional surface transects along 158°W, each spanning ~20 degrees of latitude from the North Pacific Subtropical Gyre (NPSG) to the North Pacific Transition Zone (NPTZ); 92 metatranscriptomes were derived from two diel-resolved field studies, one in the NPSG at 157°W, 23°N, one in the NPTZ at 158°W, 41°N. The metatranscriptome sequences were de novo assembled into 175 assemblies and pooled into five datasets each containing between 22 M and 49 M contigs clustered at 99% protein identity. Assemblies were annotated by taxonomy and function, and enumerated by short read alignment. All data are available in the Zenodo repository, with underlying code available on github.
海洋微生物真核生物(原生生物)在海洋生态系统中发挥着重要的代谢功能。由于在实验室环境中分离的物种很少,因此对原生生物功能的多样性仍了解甚少。宏转录组为探索原生生物多样性及其在自然栖息地中的遗传能力提供了非常有价值的工具。在这里,我们介绍了北太平洋真核生物基因目录,这是一个由总计 261 个宏转录组数据编译而成的目录:169 个宏转录组来自于沿 158°W 的三个子午面横剖面上采集的样本,每个样本跨越北太平洋亚热带环流区(NPSG)到北太平洋转换区(NPTZ)约 20 度的纬度;92 个宏转录组来自于两个昼夜解析的现场研究,一个在 NPSG 中的 157°W,23°N,一个在 NPTZ 中的 158°W,41°N。宏转录组序列被从头组装成 175 个组装体,并汇集到五个数据集,每个数据集包含 2200 万到 4900 万个在 99%蛋白质同一性水平聚类的 contigs。组装体通过分类学和功能进行注释,并通过短读序列比对进行计数。所有数据均可在 Zenodo 存储库中获得,其底层代码可在 github 上获得。