Cui Xue, Liu Yuxin, Sun Miao, Zhao Qiyue, Huang Yicheng, Zhang Jianwei, Yao Qiulin, Yin Hang, Zhang Huixin, Mo Fulei, Zhong Hongbin, Liu Yang, Chen Xiuling, Zhang Yao, Liu Jiayin, Qiu Youwen, Feng Mingfang, Chen Xu, Ghanizadeh Hossein, Zhou Yao, Wang Aoxue
College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China.
State Key Laboratory of Forage Breeding-by-Design and Utilization, Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China.
Hortic Res. 2025 Apr 16;12(7):uhaf107. doi: 10.1093/hr/uhaf107. eCollection 2025 Jul.
Structural variations (SVs) in repetitive sequences could only be detected within a broad region due to imprecise breakpoints, leading to classification errors and inaccurate trait analysis. Through manual inspection at 4532 variant regions identified by integrating 14 detection pipelines between two tomato genomes, we generated an SV benchmark at base-pair resolution. Evaluation of all pipelines yielded F1-scores below 53.77% with this benchmark, underscoring the urgent need for advanced detection algorithms in plant genomics. Analyzing the alignment features of the repetitive sequences in each region, we summarized four patterns of SV breakpoints and revealed that deviations in breakpoint identification were primarily due to copy misalignment. According to the similarities among copies, we identified 1635 SVs with precise breakpoints, including substitutions (223), which should be taken as a fundamental SV type, alongside insertions (780), deletions (619), and inversions (13), all showing preferences for SV occurrence within AT-repeat regions of regulatory loci. This precise resolution of complex SVs will foster genome analysis and crop improvement.
由于断点不精确,重复序列中的结构变异(SVs)只能在一个较宽的区域内被检测到,这会导致分类错误和性状分析不准确。通过对整合两个番茄基因组之间的14个检测流程所识别出的4532个变异区域进行人工检查,我们生成了一个碱基对分辨率的SV基准。使用这个基准对所有流程进行评估,得到的F1分数低于53.77%,这凸显了植物基因组学中对先进检测算法的迫切需求。通过分析每个区域重复序列的比对特征,我们总结出了SV断点的四种模式,并揭示了断点识别中的偏差主要是由于拷贝错配。根据拷贝之间的相似性,我们识别出了1635个具有精确断点的SVs,包括替换(223个),替换应被视为一种基本的SV类型,此外还有插入(780个)、缺失(619个)和倒位(13个),所有这些都显示出在调控位点的AT重复区域内SV出现的偏好。这种对复杂SVs的精确解析将促进基因组分析和作物改良。