RIEMS:一种用于对宏基因组学数据集的 reads 进行灵敏且全面的分类学分类的软件流程。
RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.
作者信息
Scheuch Matthias, Höper Dirk, Beer Martin
机构信息
Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
出版信息
BMC Bioinformatics. 2015 Mar 3;16(1):69. doi: 10.1186/s12859-015-0503-6.
BACKGROUND
Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck.
RESULTS
To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets.
CONCLUSIONS
RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
背景
在新一代测序技术出现并随后发展的推动下,宏基因组学成为一种在科学和诊断方面分析微生物群落的强大工具。最大的挑战是从为宏基因组学研究生成的海量序列数据集中提取相关信息。尽管有大量工具可用,但数据分析仍然是一个瓶颈。
结果
为了克服数据分析的瓶颈,我们开发了一种名为RIEMS(从宏基因组序列数据集中可靠提取信息)的自动化计算工作流程。RIEMS通过使用各种软件应用程序,以递减的分配严格度级联不同的序列分析,对数据集中的每个单独读取序列进行分类学分配。分析完成后,结果以分类学组织的清晰结构化结果协议进行总结。与使用模拟测序读取数据集进行宏基因组学数据分析的其他工具相比,RIEMS分析的高准确性和性能得到了证明。
结论
RIEMS有潜力填补宏基因组学研究数据分析方面仍然存在的空白。2011年,RIEMS的早期版本被用于检测导致施马伦贝格病毒发现的正布尼亚病毒序列,证明了RIEMS在分析真实测序数据集方面的实用性和强大功能。