Antoniewski Christophe
Drosophila Genetics and Epigenetics, Institut Pasteur, CNRS URA 2578, Paris, France.
Methods Mol Biol. 2011;721:123-42. doi: 10.1007/978-1-61779-037-9_7.
High-throughput sequencing emerged as a powerful approach to characterize siRNA populations -generated by hosts in response to viral infections. Here we described an informatic pipeline visitor to analyze in-house large sequencing datasets generated from Illumina sequencing of Drosophila small RNA libraries. The visitor perl script is designed to treat fastq sequence datasets from the Illumina sequencing platform, using a computer running under a UNIX compliant operating system (MacOS X, Linux, etc.). visitor first generates a detailed report of the sequence quality of the Illumina run. Then, using the Novoalign software, the script removes reads that match with the D. melanogaster genome from the sequencing data set. The remaining reads are aligned to a viral reference library, which can contain one or several virus genomes. visitor provides a hit table of identified viral siRNAs as well as graphics eps files of viral siRNA profiles. Unmatched small RNAs are also available in a fast format for de novo assembly and new virus discovery.
高通量测序成为一种强大的方法,用于表征宿主在应对病毒感染时产生的小干扰RNA(siRNA)群体。在这里,我们描述了一个信息分析流程,用于分析从果蝇小RNA文库的Illumina测序生成的内部大型测序数据集。该流程的perl脚本旨在处理来自Illumina测序平台的fastq序列数据集,使用运行在符合UNIX的操作系统(MacOS X、Linux等)下的计算机。该流程首先生成Illumina测序运行的序列质量详细报告。然后,使用Novoalign软件,该脚本从测序数据集中去除与黑腹果蝇基因组匹配的 reads。剩余的 reads 与病毒参考文库进行比对,该文库可以包含一个或多个病毒基因组。该流程提供已鉴定病毒siRNA的命中表以及病毒siRNA图谱的图形eps文件。未匹配的小RNA也以快速格式提供,用于从头组装和新病毒发现。