Demirci Sevgin, Peters Sander A, de Ridder Dick, van Dijk Aalt D J
Business Unit Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands.
Bioinformatics Group, Wageningen University and Research, Wageningen, the Netherlands.
Plant J. 2018 May 29. doi: 10.1111/tpj.13979.
A better understanding of genomic features influencing the location of meiotic crossovers (COs) in plant species is both of fundamental importance and of practical relevance for plant breeding. Using CO positions with sufficiently high resolution from four plant species [Arabidopsis thaliana, Solanum lycopersicum (tomato), Zea mays (maize) and Oryza sativa (rice)] we have trained machine-learning models to predict the susceptibility to CO formation. Our results show that CO occurrence within various plant genomes can be predicted by DNA sequence and shape features. Several features related to genome content and to genomic accessibility were consistently either positively or negatively related to COs in all four species. Other features were found as predictive only in specific species. Gene annotation-related features were especially predictive for maize, whereas in tomato and Arabidopsis propeller twist and helical twist (DNA shape features) and AT/TA dinucleotides were found to be the most important. In rice, high roll (another DNA shape feature) and low CA dinucleotide frequency in particular were found to be associated with CO occurrence. The accuracy of our models was sufficient for Arabidopsis and rice (area under receiver operating characteristic curve, AUROC > 0.5), and was high for tomato and maize (AUROC ≫ 0.5), demonstrating that DNA sequence and shape are predictive for meiotic COs throughout the plant kingdom.
更好地理解影响植物物种减数分裂交叉(COs)位置的基因组特征,对于植物育种而言,既具有根本重要性,又具有实际相关性。利用来自四种植物物种(拟南芥、番茄、玉米和水稻)的具有足够高分辨率的CO位置,我们训练了机器学习模型来预测CO形成的易感性。我们的结果表明,可以通过DNA序列和形状特征预测各种植物基因组中CO的发生情况。在所有四个物种中,与基因组含量和基因组可及性相关的几个特征始终与CO呈正相关或负相关。其他特征仅在特定物种中具有预测性。与基因注释相关的特征对玉米尤其具有预测性,而在番茄和拟南芥中,螺旋桨扭曲和螺旋扭曲(DNA形状特征)以及AT/TA二核苷酸被发现是最重要的。在水稻中,特别是高滚动(另一种DNA形状特征)和低CA二核苷酸频率与CO的发生有关。我们模型的准确性对于拟南芥和水稻而言足够高(受试者操作特征曲线下面积,AUROC > 0.5),对于番茄和玉米而言则很高(AUROC ≫ 0.5),这表明DNA序列和形状在整个植物界对减数分裂COs具有预测性。