Rideout Jai Ram, Chase John H, Bolyen Evan, Ackermann Gail, González Antonio, Knight Rob, Caporaso J Gregory
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, AZ, 86011, USA.
Department of Pediatrics, University of California San Diego, San Diego, CA, 92093, USA.
Gigascience. 2016 Jun 13;5:27. doi: 10.1186/s13742-016-0133-6.
Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis.
We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others.
Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.
生物信息学软件通常需要人工生成的表格文本文件作为输入,并且对这些数据的格式有特定要求。用户经常在电子表格程序中管理这些数据,这对正在汇编必要信息的研究人员来说很方便,因为电子表格程序可以在包括笔记本电脑和平板电脑在内的不同平台上轻松使用,而且它们提供了一个熟悉的界面。越来越多不同的研究人员参与到这些数据的汇编工作中,包括研究协调员、临床医生、实验室技术人员和生物信息学家。因此,许多研究团队正转向使用基于云的电子表格程序,如谷歌表格,它支持不同平台上的不同用户对单个电子表格进行并发编辑。大多数输入数据的研究人员并不熟悉将要使用的生物信息学程序的格式要求,所以在开始生物信息学分析之前,验证和纠正文件格式往往是一个瓶颈。
我们展示了Keemei,一款谷歌表格插件,用于验证生物信息学分析中使用的表格文件。Keemei可从谷歌的Chrome网络商店免费获取。Keemei可以安装并在谷歌表格支持的任何网络浏览器上运行。Keemei目前支持验证两种广泛使用的表格生物信息学格式,即微生物生态学定量见解(QIIME)样本元数据映射文件格式和空间参考遗传数据(SRGD)格式,但设计为易于支持添加其他格式。
Keemei将为表格生物信息学文件格式验证提供一个便捷界面,从而为研究人员节省时间并减少挫败感。通过让参与项目数据输入的每个人都能轻松验证他们的数据,它将减少在将人工生成的数据文件首次用于生物信息学系统时常见的验证和格式瓶颈。简化对诸如样本元数据等基本表格数据文件的验证将减少常见错误,从而提高研究结果的质量和可靠性。