Graduate program in Genetics, Bioinformatics and Computational Biology. Virginia Tech., Blacksburg, VA 24061, USA.
School of Plant and Environmental Sciences. Virginia Tech., Blacksburg, VA 24061, USA.
Nucleic Acids Res. 2020 Jun 19;48(11):e62. doi: 10.1093/nar/gkaa264.
Recent advances in genomic technologies have generated data on large-scale protein-DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5-25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.
近年来,基因组技术的发展为许多真核生物物种生成了大规模蛋白质-DNA 相互作用和开放染色质区域的数据。如何利用这些数据识别转录因子的特定条件功能,已成为基因组研究中的一个主要挑战。为了解决这个问题,我们开发了一种名为 ConSReg 的方法,该方法为将调控基因组数据整合到关键调控基因的预测机器学习模型中提供了一种新方法。我们使用拟南芥作为模型系统,测试了我们的方法,以从单细胞基因表达和非生物胁迫处理的数据集中识别调控基因。我们的结果表明,ConSReg 可以准确地预测调控差异表达基因的转录因子,其平均 auROC 为 0.84,比基于富集的方法好 23.5-25%。为了进一步验证 ConSReg 的性能,我们分析了一个与植物氮响应相关的独立数据集。在 61.7%的情况下,ConSReg 对正确转录因子的排名更好,比其他植物工具好 3 倍。我们将 ConSReg 应用于拟南芥单细胞 RNA-seq 数据,成功鉴定出控制细胞壁形成的候选调控基因。我们的方法为使用整合的基因组数据在植物中定义候选调控基因提供了一种新方法。