CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Campus de Vairão, 4485-661, Vairão, Portugal.
Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre s/n, 4169- 007, Porto, Portugal.
BMC Res Notes. 2024 Jan 24;17(1):35. doi: 10.1186/s13104-024-06686-y.
A reliable taxonomic identification of species from molecular samples is the first step for many studies. For researchers unfamiliar with programming, running a BLAST analysis, filtering, and organizing results for hundreds of sequences through the BLAST web interface can be difficult. Additionally, sequences deposited in GenBank can have outdated taxonomic identification. The use of reliable Reference Sequences Library (RSL) containing accurate taxonomically-identified sequences facilitates this task. Pending the availability of a RSL with the user, we developed a tool that automates the molecular taxonomic identification of sequences.
We developed PARSID, a Python script running through the command-line that automates the routine workflow of blasting an input sequence file against the user's RSL, and retrieves the matches with the highest percentage of identity in five steps. PARSID accepts cut-off parameters and supplementary information in a.csv file for filtering the results. The final output is visualized in a spreadsheet. We tested its functioning using 10 input sequences simulating different situations of the molecular taxonomic identification of sequences against an example RSL containing 25 sequences. Step-by-step instructions and test files are publicly available at https://github.com/kokinide/PARSID.git .
对于许多研究来说,对物种进行可靠的分类鉴定是第一步。对于不熟悉编程的研究人员来说,通过 BLAST 网络界面运行 BLAST 分析、过滤和组织数百个序列的结果可能很困难。此外,GenBank 中存储的序列可能具有过时的分类鉴定。使用包含准确分类鉴定序列的可靠参考序列库 (RSL) 可以简化此任务。在用户获得 RSL 之前,我们开发了一种工具,该工具可以自动对序列进行分子分类鉴定。
我们开发了 PARSID,这是一个通过命令行运行的 Python 脚本,可自动执行将输入序列文件与用户的 RSL 进行比对的常规工作流程,并通过五个步骤检索具有最高身份百分比匹配的结果。PARSID 在一个.csv 文件中接受截止参数和补充信息,用于过滤结果。最终输出以电子表格形式可视化。我们使用 10 个输入序列对包含 25 个序列的示例 RSL 进行了测试,模拟了序列分子分类鉴定的不同情况,以检验其功能。分步说明和测试文件可在 https://github.com/kokinide/PARSID.git 上公开获得。