State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China.
Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
Methods. 2024 Nov;231:61-69. doi: 10.1016/j.ymeth.2024.09.010. Epub 2024 Sep 16.
Arabidopsis thaliana synthesizes various medicinal compounds, and serves as a model plant for medicinal plant research. Single-cell transcriptomics technologies are essential for understanding the developmental trajectory of plant roots, facilitating the analysis of synthesis and accumulation patterns of medicinal compounds in different cell subpopulations. Although methods for interpreting single-cell transcriptomics data are rapidly advancing in Arabidopsis, challenges remain in precisely annotating cell identity due to the lack of marker genes for certain cell types. In this work, we trained a machine learning system, AtML, using sequencing datasets from six cell subpopulations, comprising a total of 6000 cells, to predict Arabidopsis root cell stages and identify biomarkers through complete model interpretability. Performance testing using an external dataset revealed that AtML achieved 96.50% accuracy and 96.51% recall. Through the interpretability provided by AtML, our model identified 160 important marker genes, contributing to the understanding of cell type annotations. In conclusion, we trained AtML to efficiently identify Arabidopsis root cell stages, providing a new tool for elucidating the mechanisms of medicinal compound accumulation in Arabidopsis roots.
拟南芥合成各种药用化合物,是药用植物研究的模式植物。单细胞转录组学技术对于理解植物根的发育轨迹至关重要,有助于分析不同细胞亚群中药用化合物的合成和积累模式。尽管在拟南芥中,解释单细胞转录组学数据的方法正在迅速发展,但由于缺乏某些细胞类型的标记基因,仍然存在精确注释细胞身份的挑战。在这项工作中,我们使用来自六个细胞亚群的测序数据集(共 6000 个细胞)训练了一个机器学习系统 AtML,以预测拟南芥根细胞的阶段,并通过完全可解释的模型来识别生物标志物。使用外部数据集进行的性能测试表明,AtML 实现了 96.50%的准确率和 96.51%的召回率。通过 AtML 提供的可解释性,我们的模型确定了 160 个重要的标记基因,有助于理解细胞类型注释。总之,我们训练了 AtML 来有效地识别拟南芥根细胞的阶段,为阐明拟南芥根中药用化合物积累的机制提供了一种新工具。