Li Zhidong, Tang Wei, You Xiong, Hou Xilin
State Key Laboratory of Crop Genetics & Germplasm Enhancement, Ministry of Agriculture and Rural Affairs of the P. R. China, College of Horticulture, Nanjing Agricultural University, Nanjing 210095, China.
Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (East China), Engineering Research Center of Germplasm Enhancement and Utilization of Horticultural Crops, Ministry of Education of the P. R. China, Nanjing Suman Plasma Engineering Research Institute, Nanjing 210095, China.
Life (Basel). 2022 Jul 21;12(7):1095. doi: 10.3390/life12071095.
Plant leaves, which convert light energy into chemical energy, serve as a major food source on Earth. The decrease in crop yield and quality is caused by plant leaf premature senescence. It is important to detect senescence-associated genes. In this study, we collected 5853 genes from a leaf senescence database and developed a leaf-senescence-associated genes (SAGs) prediction model using the support vector machine (SVM) and XGBoost algorithms. This is the first computational approach for predicting SAGs with the sequence dataset. The SVM-PCA-Kmer-PC-PseAAC model achieved the best performance (F1score = 0.866, accuracy = 0.862 and receiver operating characteristic = 0.922), and based on this model, we developed a SAGs prediction tool called "SAGs_Anno". We identified a total of 1,398,277 SAGs from 3,165,746 gene sequences from 83 species, including 12 lower plants and 71 higher plants. Interestingly, leafy species showed a higher percentage of SAGs, while leafless species showed a lower percentage of SAGs. Finally, we constructed the Leaf SAGs Annotation Platform using these available datasets and the SAGs_Anno tool, which helps users to easily predict, download, and search for plant leaf SAGs of all species. Our study will provide rich resources for plant leaf-senescence-associated genes research.
植物叶片将光能转化为化学能,是地球上主要的食物来源。作物产量和品质的下降是由植物叶片早衰引起的。检测衰老相关基因很重要。在本研究中,我们从叶片衰老数据库收集了5853个基因,并使用支持向量机(SVM)和XGBoost算法开发了一个叶片衰老相关基因(SAGs)预测模型。这是第一种利用序列数据集预测SAGs的计算方法。SVM-PCA-Kmer-PC-PseAAC模型表现最佳(F1分数 = 0.866,准确率 = 0.862,受试者工作特征曲线 = 0.922),基于此模型,我们开发了一个名为“SAGs_Anno”的SAGs预测工具。我们从83个物种的3165746个基因序列中总共鉴定出1398277个SAGs,包括12种低等植物和71种高等植物。有趣的是,多叶物种的SAGs比例较高,而无叶物种的SAGs比例较低。最后,我们利用这些可用数据集和SAGs_Anno工具构建了叶片SAGs注释平台,帮助用户轻松预测、下载和搜索所有物种的植物叶片SAGs。我们的研究将为植物叶片衰老相关基因研究提供丰富的资源。