Suppr超能文献

一个从NCBI核苷酸数据库汇编而成的完整肠杆菌科质粒的精选数据集。

A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database.

作者信息

Orlek Alex, Phan Hang, Sheppard Anna E, Doumith Michel, Ellington Matthew, Peto Tim, Crook Derrick, Walker A Sarah, Woodford Neil, Anjum Muna F, Stoesser Nicole

机构信息

Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK.

NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK.

出版信息

Data Brief. 2017 Apr 23;12:423-426. doi: 10.1016/j.dib.2017.04.024. eCollection 2017 Jun.

Abstract

Thousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. Here we present a curated dataset of complete bacterial plasmids from the clinically relevant Enterobacteriaceae family. The dataset was compiled from the NCBI nucleotide database using curation steps designed to exclude incomplete plasmid sequences, and chromosomal sequences misannotated as plasmids. Over 2000 complete plasmid sequences are included in the curated plasmid dataset. Protein sequences produced from translating each complete plasmid nucleotide sequence in all 6 frames are also provided. Further analysis and discussion of the dataset is presented in an accompanying research article: "Ordering the mob: insights into replicon and MOB typing…" (Orlek et al., 2017) [1]. The curated plasmid sequences are publicly available in the Figshare repository.

摘要

目前,NCBI核苷酸数据库中已公开了数千个质粒序列,但这些序列的注释并不可靠,无法区分完整质粒与质粒片段(如基因或重叠群序列);因此,检索完整质粒用于下游分析具有挑战性。在此,我们展示了一个来自临床相关肠杆菌科的完整细菌质粒的精选数据集。该数据集是从NCBI核苷酸数据库中汇编而来,使用了旨在排除不完整质粒序列以及被错误注释为质粒的染色体序列的筛选步骤。精选的质粒数据集中包含了2000多个完整的质粒序列。同时还提供了通过对每个完整质粒核苷酸序列的所有6个阅读框进行翻译而产生的蛋白质序列。对该数据集的进一步分析和讨论在一篇配套的研究文章《有序的移动元件:对复制子和MOB分型的见解……》(Orlek等人,2017年)[1]中给出。精选的质粒序列可在Figshare存储库中公开获取。

相似文献

8
A Curated, Comprehensive Database of Plasmid Sequences.一个经过整理的质粒序列综合数据库。
Microbiol Resour Announc. 2019 Jan 3;8(1). doi: 10.1128/MRA.01325-18. eCollection 2019 Jan.

引用本文的文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验