European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
Bioinformatics. 2014 May 1;30(9):1236-40. doi: 10.1093/bioinformatics/btu031. Epub 2014 Jan 21.
Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.
大规模序列分析是现代基因组科学的一大挑战,生物学家经常试图对数百万条序列进行特征描述。在这里,我们描述了一个新的基于 Java 的架构,用于广泛使用的蛋白质功能预测软件包 InterProScan。开发包括改进和添加软件的输出,并对软件框架进行全面的重新实现,从而构建一个灵活且稳定的系统,该系统能够使用多处理器机器和/或传统集群来实现可扩展的分布式数据分析。InterProScan 可从 EMBl-EBI FTP 站点免费下载,其开源代码托管在 Google Code 上。