Department of Molecular Biology and Genetics, Necmettin Erbakan University, Meram, Konya, 42090, Turkey.
Department of Biotechnology, Necmettin Erbakan University, Meram, Konya, 42090, Turkey.
Funct Integr Genomics. 2022 Oct;22(5):879-889. doi: 10.1007/s10142-022-00866-4. Epub 2022 May 20.
Garden cress (Lepidium sativum L.) is a Brassicaceae crop recognized as a healthy vegetable and a medicinal plant. Lepidium is one of the largest genera in Brassicaceae, yet, the genus has not been a focus of extensive genomic research. In the present work, garden cress genome was sequenced using the long read high-fidelity sequencing technology. A de novo, draft genome assembly that spans 336.5 Mb was produced, corresponding to 88.6% of the estimated genome size and representing 90% of the evolutionarily expected orthologous gene content. Protein coding gene content was structurally predicted and functionally annotated, resulting in the identification of 25,668 putative genes. A total of 599 candidate disease resistance genes were identified by predicting resistance gene domains in gene structures, and 37 genes were detected as orthologs of heavy metal associated protein coding genes. In addition, 4289 genes were assigned as "transcription factor coding." Six different machine learning algorithms were trained and tested for their performance in classifying miRNA coding genomic sequences. Logistic regression proved the best performing trained algorithm, thus utilized for pre-miRNA coding loci identification in the assembly. Repetitive DNA analysis involved the characterization of transposable element and microsatellite contents. L. sativum chloroplast genome was also assembled and functionally annotated. Data produced in the present work is expected to constitute a foundation for genomic research in garden cress and contribute to genomics-assisted crop improvement and genome evolution studies in the Brassicaceae family.
荠菜(Lepidium sativum L.)是十字花科的一种作物,被认为是一种健康的蔬菜和药用植物。荠属是十字花科中最大的属之一,但该属尚未成为广泛基因组研究的重点。在本工作中,使用长读高通量测序技术对荠菜基因组进行了测序。生成了一个跨越 336.5 Mb 的从头、草图基因组组装,对应于估计基因组大小的 88.6%,代表了进化上预期的同源基因含量的 90%。对蛋白质编码基因进行了结构预测和功能注释,鉴定出 25668 个假定基因。通过预测基因结构中的抗病基因结构域,共鉴定出 599 个候选抗病基因,检测到 37 个基因是与重金属相关蛋白编码基因的同源基因。此外,4289 个基因被归类为“转录因子编码”。使用六种不同的机器学习算法对其在分类 miRNA 编码基因组序列中的性能进行了训练和测试。逻辑回归被证明是表现最好的训练算法,因此用于组装中前体 miRNA 编码基因座的识别。重复 DNA 分析涉及转座元件和微卫星含量的特征描述。荠属叶绿体基因组也被组装并进行了功能注释。本工作产生的数据有望为荠菜的基因组研究奠定基础,并有助于十字花科作物的基因组辅助改良和基因组进化研究。