Kapoor Muskan, Ventura Enrique Sapena, Walsh Amy, Sokolov Alexey, George Nancy, Kumari Sunita, Provart Nicholas J, Cole Benjamin, Libault Marc, Tickle Timothy, Warren Wesley C, Koltes James E, Papatheodorou Irene, Ware Doreen, Harrison Peter W, Elsik Christine, Yordanova Galabina, Burdett Tony, Tuggle Christopher K
Department of Animal Science, Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States.
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, Cambridgeshire, United Kingdom.
Front Genet. 2024 Nov 29;15:1460351. doi: 10.3389/fgene.2024.1460351. eCollection 2024.
The agriculture genomics community has numerous data submission standards available, but the standards for describing and storing single-cell (SC, e.g., scRNA- seq) data are comparatively underdeveloped.
To bridge this gap, we leveraged recent advancements in human genomics infrastructure, such as the integration of the Human Cell Atlas Data Portal with Terra, a secure, scalable, open-source platform for biomedical researchers to access data, run analysis tools, and collaborate. In parallel, the Single Cell Expression Atlas at EMBL-EBI offers a comprehensive data ingestion portal for high-throughput sequencing datasets, including plants, protists, and animals (including humans). Developing data tools connecting these resources would offer significant advantages to the agricultural genomics community. The FAANG data portal at EMBL-EBI emphasizes delivering rich metadata and highly accurate and reliable annotation of farmed animals but is not computationally linked to either of these resources.
Herein, we describe a pilot-scale project that determines whether the current FAANG metadata standards for livestock can be used to ingest scRNA-seq datasets into Terra in a manner consistent with HCA Data Portal standards. Importantly, rich scRNA-seq metadata can now be brokered through the FAANG data portal using a semi-automated process, thereby avoiding the need for substantial expert curation. We have further extended the functionality of this tool so that validated and ingested SC files within the HCA Data Portal are transferred to Terra for further analysis. In addition, we verified data ingestion into Terra, hosted on Azure, and demonstrated the use of a workflow to analyze the first ingested porcine scRNA-seq dataset. Additionally, we have also developed prototype tools to visualize the output of scRNA-seq analyses on genome browsers to compare gene expression patterns across tissues and cell populations. This JBrowse tool now features distinct tracks, showcasing PBMC scRNA-seq alongside two bulk RNA-seq experiments.
We intend to further build upon these existing tools to construct a scientist-friendly data resource and analytical ecosystem based on Findable, Accessible, Interoperable, and Reusable (FAIR) SC principles to facilitate SC-level genomic analysis through data ingestion, storage, retrieval, re-use, visualization, and comparative annotation across agricultural species.
农业基因组学界有众多可用的数据提交标准,但用于描述和存储单细胞(SC,例如scRNA-seq)数据的标准相对不够完善。
为了弥合这一差距,我们利用了人类基因组学基础设施的最新进展,例如将人类细胞图谱数据门户与Terra集成,Terra是一个安全、可扩展的开源平台,供生物医学研究人员访问数据、运行分析工具并进行协作。与此同时,欧洲生物信息学研究所(EMBL-EBI)的单细胞表达图谱为高通量测序数据集提供了一个全面的数据摄入门户,包括植物、原生生物和动物(包括人类)。开发连接这些资源的数据工具将为农业基因组学界带来显著优势。EMBL-EBI的FAANG数据门户强调提供丰富的元数据以及对养殖动物的高度准确和可靠的注释,但在计算上与这两种资源均无关联。
在此,我们描述了一个试点规模的项目,该项目确定当前FAANG针对家畜的元数据标准是否可用于以与人类细胞图谱数据门户标准一致的方式将scRNA-seq数据集摄入Terra。重要的是,现在可以通过FAANG数据门户使用半自动流程来处理丰富的scRNA-seq元数据,从而避免了大量专家编目的需求。我们进一步扩展了该工具的功能,以便将人类细胞图谱数据门户内经过验证和摄入的SC文件传输到Terra进行进一步分析。此外,我们验证了数据摄入到托管在Azure上的Terra中,并展示了使用工作流程来分析第一个摄入的猪scRNA-seq数据集。此外,我们还开发了原型工具,以在基因组浏览器上可视化scRNA-seq分析的输出,从而比较不同组织和细胞群体之间的基因表达模式。这个JBrowse工具现在具有不同的轨道,展示了PBMC scRNA-seq以及两个批量RNA-seq实验。
我们打算在这些现有工具的基础上进一步构建一个基于可查找、可访问、可互操作和可重用(FAIR)的单细胞原则的科学家友好型数据资源和分析生态系统,以通过跨农业物种的数据摄入、存储、检索、再利用、可视化和比较注释来促进单细胞水平的基因组分析。