Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China.
Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China.
Int J Mol Sci. 2023 Jul 27;24(15):12023. doi: 10.3390/ijms241512023.
Utilizing large-scale epigenomics data, deep learning tools can predict the regulatory activity of genomic sequences, annotate non-coding genetic variants, and uncover mechanisms behind complex traits. However, these tools primarily rely on human or mouse data for training, limiting their performance when applied to other species. Furthermore, the limited exploration of many species, particularly in the case of livestock, has led to a scarcity of comprehensive and high-quality epigenetic data, posing challenges in developing reliable deep learning models for decoding their non-coding genomes. The cross-species prediction of the regulatory genome can be achieved by leveraging publicly available data from extensively studied organisms and making use of the conserved DNA binding preferences of transcription factors within the same tissue. In this study, we introduced DeepSATA, a novel deep learning-based sequence analyzer that incorporates the transcription factor binding affinity for the cross-species prediction of chromatin accessibility. By applying DeepSATA to analyze the genomes of pigs, chickens, cattle, humans, and mice, we demonstrated its ability to improve the prediction accuracy of chromatin accessibility and achieve reliable cross-species predictions in animals. Additionally, we showcased its effectiveness in analyzing pig genetic variants associated with economic traits and in increasing the accuracy of genomic predictions. Overall, our study presents a valuable tool to explore the epigenomic landscape of various species and pinpoint regulatory deoxyribonucleic acid (DNA) variants associated with complex traits.
利用大规模的表观基因组学数据,深度学习工具可以预测基因组序列的调控活性,注释非编码遗传变异,并揭示复杂性状背后的机制。然而,这些工具主要依赖于人类或老鼠的数据进行训练,这限制了它们在应用于其他物种时的性能。此外,许多物种的研究还很有限,特别是在牲畜方面,这导致了综合的、高质量的表观基因组学数据的缺乏,给开发用于解码其非编码基因组的可靠深度学习模型带来了挑战。通过利用广泛研究的生物体中公开可用的数据,并利用同一组织中转录因子的保守 DNA 结合偏好,可以实现对调控基因组的跨物种预测。在这项研究中,我们引入了 DeepSATA,这是一种基于深度学习的新序列分析器,它将转录因子的结合亲和力纳入了跨物种的染色质可及性预测中。通过将 DeepSATA 应用于分析猪、鸡、牛、人类和老鼠的基因组,我们证明了它能够提高染色质可及性预测的准确性,并在动物中实现可靠的跨物种预测。此外,我们展示了它在分析与经济性状相关的猪遗传变异和提高基因组预测准确性方面的有效性。总的来说,我们的研究提出了一个有价值的工具,可以探索各种物种的表观基因组景观,并确定与复杂性状相关的调控脱氧核糖核酸(DNA)变异。