Vera Alvarez Roberto, Medeiros Vidal Newton, Garzón-Martínez Gina A, Barrero Luz S, Landsman David, Mariño-Ramírez Leonardo
Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike. Bethesda, MD 20894, USA.
Colombian Corporation for Agricultural Research (CORPOICA), Km 14 vía Mosquera, Bogota, Colombia.
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax008.
The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621.
由于实验技术的迅速发展,转录组数据量呈指数级增长。作为回应,诸如美国国立生物技术信息中心(NCBI)这样的大型中央资源机构不断调整其计算基础设施,以适应大量涌入的数据。已经创建了新的专门数据库,如转录组鸟枪法测序序列数据库(TSA)和序列读数档案库(SRA),以协助集中式存储库的开发和扩展。尽管中央资源数据库在不断发展,但它们不包括用于增加新存入数据注释的自动管道。因此,需要第三方应用程序来实现这一目标。在这里,我们展示了一种用于转录组数据注释的自动工作流程和网络应用程序。该工作流程创建诸如测序读数和BLAST比对等二级数据,可通过网络应用程序获取。它们基于内部开发的免费生物信息学工具和脚本。交互式网络应用程序提供了一个搜索引擎和几个浏览器实用程序。转录本比对的图形视图可通过SeqViewer获得,SeqViewer是NCBI开发的用于查看生物序列数据的嵌入式工具。该网络应用程序与其他NCBI网络应用程序和工具紧密集成,以扩展数据处理和互连的功能。我们展示了一个针对酸浆的数据案例研究,数据来自生物项目ID 67621。