Cantacessi Cinzia, Jex Aaron R, Hall Ross S, Young Neil D, Campbell Bronwyn E, Joachim Anja, Nolan Matthew J, Abubucker Sahar, Sternberg Paul W, Ranganathan Shoba, Mitreva Makedonka, Gasser Robin B
Department of Veterinary Science, The University of Melbourne, 250 Princes Highway, Werribee, Victoria 3030, Australia.
Nucleic Acids Res. 2010 Sep;38(17):e171. doi: 10.1093/nar/gkq667. Epub 2010 Aug 3.
Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism.
转录组学(在单细胞、组织和/或整个生物体水平)是生物医学科学许多领域的基础,从了解模式生物中的基本细胞功能,到阐明控制人类疾病发生和发展的生物学事件,以及探索病原体的生存、耐药性和毒力机制。新一代测序(NGS)技术正在推动转录组学在所有领域的大规模扩展,并正在降低传统方法所带来的成本、时间和性能障碍。然而,对于生物信息学专业知识有限或没有专业知识的研究人员来说,分析这些技术产生的序列数据集的生物信息学工具可能令人生畏。在这里,我们构建了一个半自动的生物信息学工作流程系统,并对其进行了严格评估,以用于分析和注释由NGS生成的大规模序列数据集。我们展示了它在探索一种具有经济重要性的寄生蠕虫(齿状食道口线虫)不同阶段和两性之间转录组差异方面的效用,以及在预测和确定作为新型药物靶点候选物的必需分子(包括GTP酶、蛋白激酶和磷酸酶)方面的效用。这个工作流程系统为NGS数据集的组装、注释和分析提供了一个实用工具,也适用于生物信息学专业知识有限的研究人员。所使用的定制编写的Perl、Python和Unix shell计算机脚本可以很容易地修改或调整以适应许多不同的应用。该系统现在经常用于分析具有重大社会经济重要性的病原体的数据集,并且原则上可以应用于来自任何生物体的转录组学数据集。