Suppr超能文献

利用 SVM-RFE 预测拟南芥中的耐旱基因。

Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.

机构信息

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.

出版信息

PLoS One. 2011;6(7):e21750. doi: 10.1371/journal.pone.0021750. Epub 2011 Jul 15.

Abstract

BACKGROUND

Identifying genes with essential roles in resisting environmental stress rates high in agronomic importance. Although massive DNA microarray gene expression data have been generated for plants, current computational approaches underutilize these data for studying genotype-trait relationships. Some advanced gene identification methods have been explored for human diseases, but typically these methods have not been converted into publicly available software tools and cannot be applied to plants for identifying genes with agronomic traits.

METHODOLOGY

In this study, we used 22 sets of Arabidopsis thaliana gene expression data from GEO to predict the key genes involved in water tolerance. We applied an SVM-RFE (Support Vector Machine-Recursive Feature Elimination) feature selection method for the prediction. To address small sample sizes, we developed a modified approach for SVM-RFE by using bootstrapping and leave-one-out cross-validation. We also expanded our study to predict genes involved in water susceptibility.

CONCLUSIONS

We analyzed the top 10 genes predicted to be involved in water tolerance. Seven of them are connected to known biological processes in drought resistance. We also analyzed the top 100 genes in terms of their biological functions. Our study shows that the SVM-RFE method is a highly promising method in analyzing plant microarray data for studying genotype-phenotype relationships. The software is freely available with source code at http://ccst.jlu.edu.cn/JCSB/RFET/.

摘要

背景

鉴定在抵御环境压力方面具有重要作用的基因在农业学中具有很高的重要性。尽管已经为植物生成了大量的 DNA 微阵列基因表达数据,但当前的计算方法在研究基因型-表型关系时并未充分利用这些数据。一些先进的基因鉴定方法已被探索用于人类疾病,但通常这些方法尚未转换为公共可用的软件工具,也无法应用于植物以鉴定具有农艺性状的基因。

方法

在这项研究中,我们使用了 22 组来自 GEO 的拟南芥基因表达数据来预测参与水分耐受性的关键基因。我们应用了 SVM-RFE(支持向量机-递归特征消除)特征选择方法进行预测。为了解决小样本量的问题,我们通过使用引导和留一法交叉验证开发了一种 SVM-RFE 的改进方法。我们还扩展了我们的研究,以预测参与水分敏感性的基因。

结论

我们分析了预测参与水分耐受性的前 10 个基因。其中有 7 个与干旱抗性的已知生物学过程有关。我们还分析了前 100 个基因的生物学功能。我们的研究表明,SVM-RFE 方法是分析植物微阵列数据以研究基因型-表型关系的一种非常有前途的方法。该软件可在 http://ccst.jlu.edu.cn/JCSB/RFET/ 上免费获得,包括源代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56ae/3137602/74df3a10e5a4/pone.0021750.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验