Castanera Raúl, Ruggieri Valentino, Pujol Marta, Garcia-Mas Jordi, Casacuberta Josep M
Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Barcelona, Spain.
Institut de Recerca i Tecnologia Agroalimentàries (IRTA), Genomics and Biotecnology Program, Barcelona, Spain.
Front Plant Sci. 2020 Jan 31;10:1815. doi: 10.3389/fpls.2019.01815. eCollection 2019.
The published melon ( L.) reference genome assembly (v3.6.1) has still 41.6 Mb (Megabases) of sequences unassigned to pseudo-chromosomes and about 57 Mb of gaps. Although different approaches have been undertaken to improve the melon genome assembly in recent years, the high percentage of repeats (~40%) and limitations due to read length have made it difficult to resolve gaps and scaffold's misassignments to pseudomolecules, especially in the heterochromatic regions. Taking advantage of the PacBio single- molecule real-time (SMRT) sequencing technology, an improvement of the melon genome was achieved. About 90% of the gaps were filled and the unassigned sequences were drastically reduced. A lift-over of the latest annotation v4.0 allowed to re-collocate protein-coding genes belonging to the unassigned sequences to the pseudomolecules. A direct proof of the improvement reached in the new melon assembly was highlighted looking at the improved annotation of the transposable element fraction. By screening the new assembly, we discovered many young (inserted less than 2Mya), polymorphic LTR-retrotransposons that were not captured in the previous reference genome. These elements sit mostly in the pericentromeric regions, but some of them are inserted in the upstream region of genes suggesting that they can have regulatory potential. This improved reference genome will provide an invaluable tool for identifying new gene or transposon variants associated with important phenotypes.
已发表的甜瓜(L.)参考基因组组装(v3.6.1)仍有41.6兆碱基(Mb)的序列未分配到假染色体上,并且有约57 Mb的缺口。尽管近年来已经采取了不同方法来改进甜瓜基因组组装,但重复序列的高比例(约40%)以及读长带来的限制使得难以解决缺口和支架对假分子的错误分配问题,尤其是在异染色质区域。利用PacBio单分子实时(SMRT)测序技术,实现了甜瓜基因组的改进。约90%的缺口被填补,未分配序列大幅减少。最新注释v4.0的转换使得属于未分配序列的蛋白质编码基因能够重新定位到假分子上。通过观察转座元件部分改进后的注释,突出显示了新甜瓜组装中所取得改进的直接证据。通过筛选新组装,我们发现了许多年轻的(插入时间少于200万年)、多态性的LTR反转录转座子,这些在之前的参考基因组中未被捕获。这些元件大多位于着丝粒周围区域,但其中一些插入到基因的上游区域,表明它们可能具有调控潜力。这个改进的参考基因组将为鉴定与重要表型相关的新基因或转座子变体提供一个宝贵的工具。