Li Dequan, Gao Wen, Ling Charles X, Wang Xiaobiao, Sun Ruixiang, He Simin
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China.
Bioinformatics. 2006 Oct 15;22(20):2572-3. doi: 10.1093/bioinformatics/btl410. Epub 2006 Aug 31.
A software package, IndexToolkit, aimed at overcoming the disadvantage of FASTA-format databases for frequent searching, is developed to utilize an indexing strategy to substantially accelerate sequence queries. IndexToolkit includes user-friendly tools and an Application Programming Interface (API) to facilitate indexing, storage and retrieval of protein sequence databases. As open source, it provides a sequence-retrieval developing framework, which is easily extensible for high-speed-request proteomic applications, such as database searching or modification discovering. We applied IndexToolkit to database searching engine pFind to demonstrate its effect. Experimental studies show that IndexToolkit is able to support significantly faster searches of protein database.
The IndexToolkit is free to use under the open source GNU GPL license. The source code and the compiled binary can be freely accessed through the website http://pfind.jdl.ac.cn/IndexToolkit. In this website, the more detailed information including screenshots and documentations for users and developers is also available.
开发了一个名为IndexToolkit的软件包,旨在克服FASTA格式数据库在频繁搜索方面的缺点,它利用索引策略大幅加速序列查询。IndexToolkit包括用户友好的工具和应用程序编程接口(API),以方便蛋白质序列数据库的索引、存储和检索。作为开源软件,它提供了一个序列检索开发框架,易于扩展以用于高速请求的蛋白质组学应用,如数据库搜索或修饰发现。我们将IndexToolkit应用于数据库搜索引擎pFind以证明其效果。实验研究表明,IndexToolkit能够显著加快蛋白质数据库的搜索速度。
IndexToolkit在开源的GNU GPL许可下可免费使用。源代码和编译后的二进制文件可通过网站http://pfind.jdl.ac.cn/IndexToolkit免费获取。在该网站上,还提供了更详细的信息,包括用户和开发者的屏幕截图及文档。