人类病原体/载体基因组序列的标准化元数据。

Standardized metadata for human pathogen/vector genomic sequences.

作者信息

Dugan Vivien G, Emrich Scott J, Giraldo-Calderón Gloria I, Harb Omar S, Newman Ruchi M, Pickett Brett E, Schriml Lynn M, Stockwell Timothy B, Stoeckert Christian J, Sullivan Dan E, Singh Indresh, Ward Doyle V, Yao Alison, Zheng Jie, Barrett Tanya, Birren Bruce, Brinkac Lauren, Bruno Vincent M, Caler Elizabet, Chapman Sinéad, Collins Frank H, Cuomo Christina A, Di Francesco Valentina, Durkin Scott, Eppinger Mark, Feldgarden Michael, Fraser Claire, Fricke W Florian, Giovanni Maria, Henn Matthew R, Hine Erin, Hotopp Julie Dunning, Karsch-Mizrachi Ilene, Kissinger Jessica C, Lee Eun Mi, Mathur Punam, Mongodin Emmanuel F, Murphy Cheryl I, Myers Garry, Neafsey Daniel E, Nelson Karen E, Nierman William C, Puzak Julia, Rasko David, Roos David S, Sadzewicz Lisa, Silva Joana C, Sobral Bruno, Squires R Burke, Stevens Rick L, Tallon Luke, Tettelin Herve, Wentworth David, White Owen, Will Rebecca, Wortman Jennifer, Zhang Yun, Scheuermann Richard H

机构信息

J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America; National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America.

University of Notre Dame, Notre Dame, Indiana, United States of America.

出版信息

PLoS One. 2014 Jun 17;9(6):e99979. doi: 10.1371/journal.pone.0099979. eCollection 2014.

DOI:10.1371/journal.pone.0099979

PMID:24936976

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4061050/

Abstract

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.

摘要

高通量测序加速了数千种人类传染病病原体及其数十种传播媒介的基因组序列测定。这些数据的规模和范围使得基因型-表型关联研究能够确定病原体毒力和药物/杀虫剂抗性的遗传决定因素，以及系统发育研究能够追踪疾病爆发的起源和传播。为了最大限度地利用基因组序列实现这些目的，收集病原体/媒介分离株特征的元数据并以有组织、清晰和一致的格式提供至关重要。在此，我们报告了由传染病基因组测序中心（GSCIDs）、传染病生物信息学资源中心（BRCs）以及美国国立卫生研究院（NIH）下属的美国国立过敏和传染病研究所（NIAID）的代表们共同开发的GSCID/BRC项目和样本应用标准，该标准是在与众多合作科学家的互动基础上制定的。它包括映射到其他数据标准倡议中的术语，包括基因组标准联盟的最小信息（MIxS）、NCBI的生物样本/生物项目清单以及生物医学调查本体（OBI）。该标准包括有关标本的生物体或环境来源特征的数据字段、标本分离事件的时空信息、分离出的病原体/媒介的表型特征以及项目领导和支持信息。通过将元数据字段建模到基于本体的语义框架中并重用现有的本体和最小信息清单，该应用标准可以扩展以支持其他特定项目的数据字段，并与以可比标准表示的其他数据集成。所有正在进行和未来的GSCID测序项目使用此元数据标准将在BRC资源和其他利用这些数据的存储库中提供这些数据的一致表示，使研究人员能够识别相关的基因组序列并进行具有统计学意义和生物学相关性的比较基因组学分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d395/4061050/125a5f1800a9/pone.0099979.g001.jpg

相似文献

Standardized metadata for human pathogen/vector genomic sequences.人类病原体/载体基因组序列的标准化元数据。

PLoS One. 2014 Jun 17;9(6):e99979. doi: 10.1371/journal.pone.0099979. eCollection 2014.

OMeta: an ontology-based, data-driven metadata tracking system.OMeta：一个基于本体论的数据驱动的元数据跟踪系统。

BMC Bioinformatics. 2019 Jan 7;20(1):8. doi: 10.1186/s12859-018-2580-9.

"METAGENOTE: a simplified web platform for metadata annotation of genomic samples and streamlined submission to NCBI's sequence read archive".METAGENOTE：一个简化的基因组样本元数据注释的网络平台，简化了向 NCBI 的序列读取档案提交的流程。

BMC Bioinformatics. 2020 Sep 3;21(1):378. doi: 10.1186/s12859-020-03694-0.

A Schema for Digitized Surface Swab Site Metadata in Open-Source DNA Sequence Databases.用于开源 DNA 序列数据库中数字化表面拭子采样地点元数据的方案。

mSystems. 2023 Apr 27;8(2):e0128422. doi: 10.1128/msystems.01284-22. Epub 2023 Feb 27.

Pathema: a clade-specific bioinformatics resource center for pathogen research.Pathema：病原体研究的一个特定进化枝的生物信息学资源中心。

Nucleic Acids Res. 2010 Jan;38(Database issue):D408-14. doi: 10.1093/nar/gkp850. Epub 2009 Oct 20.

VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center.VEuPathDB：真核病原体、载体和宿主生物信息学资源中心。

Nucleic Acids Res. 2022 Jan 7;50(D1):D898-D911. doi: 10.1093/nar/gkab929.

Genomes OnLine database (GOLD) v.7: updates and new features.基因组在线数据库（GOLD）v.7：更新和新功能。

Nucleic Acids Res. 2019 Jan 8;47(D1):D649-D659. doi: 10.1093/nar/gky977.

Comparative Analysis and Data Provenance for 1,113 Bacterial Genome Assemblies.对 1113 个细菌基因组组装的比较分析和数据溯源。

mSphere. 2022 Jun 29;7(3):e0007722. doi: 10.1128/msphere.00077-22. Epub 2022 May 2.

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.CAIRR 管道用于向国家生物技术信息中心存储库提交符合标准的 B 和 T 细胞受体文库测序研究。

Front Immunol. 2018 Aug 16;9:1877. doi: 10.3389/fimmu.2018.01877. eCollection 2018.

Creation of Standardized Common Data Elements for Diagnostic Tests in Infectious Disease Studies: Semantic and Syntactic Mapping.创建传染病研究中诊断测试的标准化通用数据元素：语义和句法映射。

J Med Internet Res. 2024 Jun 10;26:e50049. doi: 10.2196/50049.

引用本文的文献

The international nucleotide sequence database collaboration (INSDC): enhancing global participation.国际核苷酸序列数据库协作组织（INSDC）：加强全球参与度。

Nucleic Acids Res. 2025 Jan 6;53(D1):D62-D66. doi: 10.1093/nar/gkae1058.

Methods for Genomic Epidemiology of Bacterial Pathogens: Example Salmonella.细菌病原体基因组流行病学研究方法：以沙门氏菌为例。

Methods Mol Biol. 2024;2813:19-37. doi: 10.1007/978-1-0716-3890-3_2.

Navigating the Landscape: A Comprehensive Review of Current Virus Databases.探索全景：当前病毒数据库的全面综述

Viruses. 2023 Aug 29;15(9):1834. doi: 10.3390/v15091834.

Developing a standardized but extendable framework to increase the findability of infectious disease datasets.开发一个标准化但可扩展的框架，以提高传染病数据集的可发现性。

Sci Data. 2023 Feb 23;10(1):99. doi: 10.1038/s41597-023-01968-9.

A new framework for host-pathogen interaction research.宿主-病原体相互作用研究的新框架。

Front Immunol. 2022 Dec 15;13:1066733. doi: 10.3389/fimmu.2022.1066733. eCollection 2022.

Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR.推出细菌和病毒生物信息学资源中心（BV-BRC）：一个整合 PATRIC、IRD 和 ViPR 的资源。

Nucleic Acids Res. 2023 Jan 6;51(D1):D678-D689. doi: 10.1093/nar/gkac1003.

Role of microbiota and microbiota-derived short-chain fatty acids in PDAC.微生物群及其衍生的短链脂肪酸在 PDAC 中的作用。

Cancer Med. 2023 Mar;12(5):5661-5675. doi: 10.1002/cam4.5323. Epub 2022 Oct 7.

Metadata harmonization-Standards are the key for a better usage of omics data for integrative microbiome analysis.元数据协调——标准是更好地利用组学数据进行综合微生物组分析的关键。

Environ Microbiome. 2022 Jun 24;17(1):33. doi: 10.1186/s40793-022-00425-1.

Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package.为元数据提供未来保障并实现其最大效用：PHA4GE SARS-CoV-2 情境数据规范包。

Gigascience. 2022 Feb 16;11. doi: 10.1093/gigascience/giac003.

Precision omics data integration and analysis with interoperable ontologies and their application for COVID-19 research.精准组学数据的整合与分析，采用可互操作的本体论及其在 COVID-19 研究中的应用。

Brief Funct Genomics. 2021 Jul 17;20(4):235-248. doi: 10.1093/bfgp/elab029.

本文引用的文献

PATRIC, the bacterial bioinformatics database and analysis resource.PATRIC，细菌生物信息学数据库和分析资源。

Nucleic Acids Res. 2014 Jan;42(Database issue):D581-91. doi: 10.1093/nar/gkt1099. Epub 2013 Nov 12.

OntoMaton: a bioportal powered ontology widget for Google Spreadsheets.OntoMaton：一个为 Google Spreadsheets 提供动力的生物门户本体小部件。

Bioinformatics. 2013 Feb 15;29(4):525-7. doi: 10.1093/bioinformatics/bts718. Epub 2012 Dec 24.

Genetic loci associated with delayed clearance of Plasmodium falciparum following artemisinin treatment in Southeast Asia.与东南亚地区青蒿素治疗后疟原虫清除延迟相关的遗传位点。

Proc Natl Acad Sci U S A. 2013 Jan 2;110(1):240-5. doi: 10.1073/pnas.1211205110. Epub 2012 Dec 17.

GenBank.GenBank。

Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. doi: 10.1093/nar/gks1195. Epub 2012 Nov 27.

EuPathDB: the eukaryotic pathogen database.EuPathDB：真核病原体数据库。

Nucleic Acids Res. 2013 Jan;41(Database issue):D684-91. doi: 10.1093/nar/gks1113. Epub 2012 Nov 21.

A fine scale phenotype-genotype virulence map of a bacterial pathogen.一种细菌病原体的精细表型-基因型-毒力图谱。

Genome Res. 2012 Dec;22(12):2541-51. doi: 10.1101/gr.137430.112. Epub 2012 Jul 23.

Plague in the genomic area.基因组区域的瘟疫。

Clin Microbiol Infect. 2012 Mar;18(3):224-30. doi: 10.1111/j.1469-0691.2012.03774.x.

A "genome-to-lead" approach for insecticide discovery: pharmacological characterization and screening of Aedes aegypti D(1)-like dopamine receptors.一种基于基因组到先导化合物的杀虫剂发现方法：埃及伊蚊 D(1)-样多巴胺受体的药理学特征分析和筛选。

PLoS Negl Trop Dis. 2012 Jan;6(1):e1478. doi: 10.1371/journal.pntd.0001478. Epub 2012 Jan 24.

Influenza research database: an integrated bioinformatics resource for influenza research and surveillance.流感研究数据库：流感研究和监测的综合生物信息学资源。

Influenza Other Respir Viruses. 2012 Nov;6(6):404-16. doi: 10.1111/j.1750-2659.2011.00331.x. Epub 2012 Jan 20.

BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata.NCBI 的 BioProject 和 BioSample 数据库：促进元数据的捕获和组织。

Nucleic Acids Res. 2012 Jan;40(Database issue):D57-63. doi: 10.1093/nar/gkr1163. Epub 2011 Dec 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人类病原体/载体基因组序列的标准化元数据。

Standardized metadata for human pathogen/vector genomic sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献