Chai Chew, Gibson Jesse, Li Pengyang, Pampari Anusri, Patel Aman, Kundaje Anshul, Wang Bo
Department of Bioengineering, Stanford University, Stanford, USA.
Department of Computer Science, Stanford University, Stanford, USA.
bioRxiv. 2024 Sep 6:2024.09.03.611027. doi: 10.1101/2024.09.03.611027.
Cell types evolve into a hierarchy with related types grouped into families. How cell type diversification is constrained by the stable separation between families over vast evolutionary times remains unknown. Here, integrating single-nucleus multiomic sequencing and deep learning, we show that hundreds of sequence features (motifs) divide into distinct sets associated with accessible genomes of specific cell type families. This division is conserved across highly divergent, early-branching animals including flatworms and cnidarians. While specific interactions between motifs delineate cell type relationships within families, surprisingly, these interactions are not conserved between species. Consistently, while deep learning models trained on one species can predict accessibility of other species' sequences, their predictions frequently rely on distinct, but synonymous, motif combinations. We propose that long-term stability of cell type families is maintained through genome access specified by conserved motif sets, or 'vocabularies', whereas cell types diversify through flexible use of motifs within each set.
细胞类型逐渐演变成一个层次结构,相关类型被归为家族。在漫长的进化时间里,细胞类型的多样化如何受到家族之间稳定分离的限制仍然未知。在这里,通过整合单核多组学测序和深度学习,我们表明数百个序列特征(基序)分为与特定细胞类型家族的可及基因组相关的不同集合。这种划分在包括扁虫和刺胞动物在内的高度分化的早期分支动物中是保守的。虽然基序之间的特定相互作用描绘了家族内的细胞类型关系,但令人惊讶的是,这些相互作用在物种之间并不保守。一致地,虽然在一个物种上训练的深度学习模型可以预测其他物种序列的可及性,但它们的预测通常依赖于不同但同义的基序组合。我们提出,细胞类型家族的长期稳定性是通过由保守的基序集或“词汇表”指定的基因组可及性来维持的,而细胞类型则通过灵活使用每个集合中的基序来实现多样化。