Laboratorio de Procesamiento de Imágenes, ICyTE-CONICET-UNMdP, Mar del Plata, Argentina.
Computational Biology and Comparative Genomics, IIB-CONICET-UNMdP, Mar del Plata, Argentina.
Genome Biol. 2024 Aug 26;25(1):230. doi: 10.1186/s13059-024-03371-y.
Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.
Seqrutinator 是一个客观、灵活的管道,可从复杂的真核蛋白超家族中去除具有测序和/或基因模型错误的序列以及来自假基因的序列。在 BAHD、CYP 和 UGT 等主要超家族上测试 Seqrutinator 仅去除了 1.94%的 SwissProt 条目、14%的拟南芥模式植物条目,但去除了 80%的来自火炬松最近完成的完整蛋白质组的条目。将 Seqrutinator 应用于从 16 种植物蛋白质组中获得的原始 BAHDomes、CYPomes 和 UGTomes 表明,旁系同源物的数量趋于一致。MSA、系统发育,特别是功能聚类在 Seqrutinator 应用后有了显著改善,表明其性能良好。