Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS B3H 4R2, Canada.
Institute for Comparative Genomics, Dalhousie University, Halifax, NS B3H 4R2, Canada.
STAR Protoc. 2021 Oct 16;2(4):100888. doi: 10.1016/j.xpro.2021.100888. eCollection 2021 Dec 17.
Annotating protein-coding genes can be challenging, especially when searching for the best hits against multiple functional databases. This is partly because of "bad words" appearing as top hits, such as hypothetical or uncharacterized proteins. To help alleviate some of these issues, we designed a bioinformatics tool called NoBadWordsCombiner, which efficiently merges the hits from various databases, strengthening gene definitions by minimizing functional descriptions containing "bad words." Unlike other available tools, NoBadWordsCombiner is user friendly, but it does require users to have some general bioinformatics skills, including a basic understanding of the BLAST package and dash shell in Linux/Unix environments. For complete details on the use and execution of this protocol, please refer to Zhang et al. (2021a).
注释蛋白质编码基因可能具有挑战性,特别是在针对多个功能数据库搜索最佳匹配时。这在一定程度上是因为出现了“坏词”作为顶级匹配,例如假设或未表征的蛋白质。为了帮助缓解其中的一些问题,我们设计了一种名为 NoBadWordsCombiner 的生物信息学工具,它可以有效地合并来自各种数据库的命中结果,通过最小化包含“坏词”的功能描述来加强基因定义。与其他可用工具不同,NoBadWordsCombiner 用户友好,但它确实要求用户具备一些一般的生物信息学技能,包括对 BLAST 包和 Linux/Unix 环境中的 dash shell 的基本了解。有关此协议的使用和执行的详细信息,请参阅 Zhang 等人(2021a)。