Lu Quan, Xu Jiajun, Zhang Renyi, Liu Hangcheng, Wang Meng, Liu Xiaoshuang, Yue Zhenyu, Gao Yujia
School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China.
Research Center for Biological Breeding Technology, Advance Academy, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae702.
Given the adverse effects faced by rice due to abiotic stresses, the precise and rapid identification of single nucleotide polymorphisms (SNPs) associated with abiotic stress traits (ABST-SNPs) in rice is crucial for developing resistant rice varieties. The scarcity of high-quality data related to abiotic stress in rice has hindered the development of computational models and constrained research efforts aimed at rice improvement and breeding. Genome-wide association studies provide a better statistical power to consider ABST-SNPs in rice. Meanwhile, deep learning methods have shown their capability in predicting disease- or phenotype-associated loci, but have primarily focused on human species. Therefore, developing predictive models for identifying ABST-SNPs in rice is both urgent and valuable. In this paper, a model called RiceSNP-ABST is proposed for predicting ABST-SNPs in rice. Firstly, six training datasets were generated using a novel strategy for negative sample construction. Secondly, four feature encoding methods were proposed based on DNA sequence fragments, followed by feature selection. Finally, convolutional neural networks with residual connections were used to determine whether the sequences contained rice ABST-SNPs. RiceSNP-ABST outperformed traditional machine learning and state-of-the-art methods on the benchmark dataset and demonstrated consistent generalization on an independent dataset and cross-species datasets. Notably, multi-granularity causal structure learning was employed to elucidate the relationships among DNA structural features, aiming to identify key genetic variants more effectively. The web-based tool for the RiceSNP-ABST can be accessed at http://rice-snp-abst.aielab.cc.
鉴于水稻因非生物胁迫而面临的不利影响,精确快速地鉴定水稻中与非生物胁迫性状相关的单核苷酸多态性(ABST-SNPs)对于培育抗性水稻品种至关重要。水稻中与非生物胁迫相关的高质量数据匮乏,阻碍了计算模型的开发,并限制了旨在改良水稻和育种的研究工作。全基因组关联研究为考虑水稻中的ABST-SNPs提供了更好的统计能力。同时,深度学习方法已显示出其预测疾病或表型相关位点的能力,但主要集中在人类物种上。因此,开发用于鉴定水稻中ABST-SNPs的预测模型既紧迫又有价值。本文提出了一种名为RiceSNP-ABST的模型来预测水稻中的ABST-SNPs。首先,使用一种新颖的负样本构建策略生成了六个训练数据集。其次,基于DNA序列片段提出了四种特征编码方法,随后进行特征选择。最后,使用带有残差连接的卷积神经网络来确定序列是否包含水稻ABST-SNPs。在基准数据集上,RiceSNP-ABST的性能优于传统机器学习方法和当前最先进的方法,并在独立数据集和跨物种数据集上表现出一致的泛化能力。值得注意的是,采用多粒度因果结构学习来阐明DNA结构特征之间的关系,旨在更有效地识别关键遗传变异。可通过http://rice-snp-abst.aielab.cc访问基于网络的RiceSNP-ABST工具。