Institute of Plant Biology & Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Braunschweig, TU, Germany.
BMC Genomics. 2022 Mar 19;23(1):220. doi: 10.1186/s12864-022-08452-5.
MYBs are among the largest transcription factor families in plants. Consequently, members of this family are involved in a plethora of processes including development and specialized metabolism. The MYB families of many plant species were investigated in the last two decades since the first investigation looked at Arabidopsis thaliana. This body of knowledge and characterized sequences provide the basis for the identification, classification, and functional annotation of candidate sequences in new genome and transcriptome assemblies.
A pipeline for the automatic identification and functional annotation of MYBs in a given sequence data set was implemented in Python. MYB candidates are identified, screened for the presence of a MYB domain and other motifs, and finally placed in a phylogenetic context with well characterized sequences. In addition to technical benchmarking based on existing annotation, the transcriptome assembly of Croton tiglium and the annotated genome sequence of Castanea crenata were screened for MYBs. Results of both analyses are presented in this study to illustrate the potential of this application. The analysis of one species takes only a few minutes depending on the number of predicted sequences and the size of the MYB gene family. This pipeline, the required bait sequences, and reference sequences for a classification are freely available on github: https://github.com/bpucker/MYB_annotator .
This automatic annotation of the MYB gene family in novel assemblies makes genome-wide investigations consistent and paves the way for comparative studies in the future. Candidate genes for in-depth analyses are presented based on their orthology to previously characterized sequences which allows the functional annotation of the newly identified MYBs with high confidence. The identification of orthologs can also be harnessed to detect duplication and deletion events.
MYB 是植物中最大的转录因子家族之一。因此,该家族的成员参与了许多过程,包括发育和特化代谢。在过去的二十年中,对许多植物物种的 MYB 家族进行了研究,因为第一次调查研究了拟南芥。这些知识和已鉴定的序列为在新的基因组和转录组组装中鉴定、分类和功能注释候选序列提供了基础。
在 Python 中实现了一个用于在给定序列数据集自动识别和功能注释 MYB 的流水线。鉴定出 MYB 候选物,筛选是否存在 MYB 结构域和其他基序,并最终将其与经过充分表征的序列放在系统发育背景下。除了基于现有注释的技术基准测试外,还对巴豆和栗的转录组组装以及栗的注释基因组序列进行了 MYB 筛选。本研究介绍了这两种分析的结果,以说明该应用的潜力。根据预测序列的数量和 MYB 基因家族的大小,分析一个物种只需几分钟。这个流水线、所需的诱饵序列和分类的参考序列可在 github 上免费获得:https://github.com/bpucker/MYB_annotator。
该方法可以对新组装体中的 MYB 基因家族进行自动注释,使全基因组研究具有一致性,并为未来的比较研究铺平道路。根据与以前鉴定的序列的同源性,提出了深入分析的候选基因,这使得对新鉴定的 MYB 进行高可信度的功能注释成为可能。同源物的鉴定也可用于检测复制和缺失事件。