Department of Genetics, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, United States.
Edison Family Center for Genome Sciences and Systems Biology, Washington University in St. Louis School of Medicine, Saint Louis, MO 63110, United States.
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae070.
Unraveling the transcriptional programs that control how cells divide, differentiate, and respond to their environments requires a precise understanding of transcription factors' (TFs) DNA-binding activities. Calling cards (CC) technology uses transposons to capture transient TF binding events at one instant in time and then read them out at a later time. This methodology can also be used to simultaneously measure TF binding and mRNA expression from single-cell CC and to record and integrate TF binding events across time in any cell type of interest without the need for purification. Despite these advantages, there has been a lack of dedicated bioinformatics tools for the detailed analysis of CC data.
We introduce Pycallingcards, a comprehensive Python module specifically designed for the analysis of single-cell and bulk CC data across multiple species. Pycallingcards introduces two innovative peak callers, CCcaller and MACCs, enhancing the accuracy and speed of pinpointing TF binding sites from CC data. Pycallingcards offers a fully integrated environment for data visualization, motif finding, and comparative analysis with RNA-seq and ChIP-seq datasets. To illustrate its practical application, we have reanalyzed previously published mouse cortex and glioblastoma datasets. This analysis revealed novel cell-type-specific binding sites and potential sex-linked TF regulators, furthering our understanding of TF binding and gene expression relationships. Thus, Pycallingcards, with its user-friendly design and seamless interface with the Python data science ecosystem, stands as a critical tool for advancing the analysis of TF functions via CC data.
Pycallingcards can be accessed on the GitHub repository: https://github.com/The-Mitra-Lab/pycallingcards.
要了解控制细胞分裂、分化和对环境做出反应的转录程序,就需要精确理解转录因子(TFs)的 DNA 结合活性。标记卡(CC)技术使用转座子在一个时间点捕获瞬时 TF 结合事件,然后在稍后的时间读取它们。这种方法还可以用于同时测量单细胞 CC 的 TF 结合和 mRNA 表达,并在任何感兴趣的细胞类型中记录和整合跨时间的 TF 结合事件,而无需进行纯化。尽管有这些优势,但缺乏专门用于 CC 数据详细分析的生物信息学工具。
我们介绍了 Pycallingcards,这是一个专门为分析跨多个物种的单细胞和批量 CC 数据而设计的全面 Python 模块。Pycallingcards 引入了两个创新的峰调用器,CCcaller 和 MACCs,提高了从 CC 数据中精确定位 TF 结合位点的准确性和速度。Pycallingcards 提供了一个完整的集成环境,用于数据可视化、基序发现以及与 RNA-seq 和 ChIP-seq 数据集的比较分析。为了说明其实用性,我们重新分析了以前发表的小鼠皮层和神经胶质瘤数据集。这项分析揭示了新的细胞类型特异性结合位点和潜在的性连锁 TF 调节剂,进一步加深了我们对 TF 结合和基因表达关系的理解。因此,Pycallingcards 以其用户友好的设计和与 Python 数据科学生态系统的无缝接口,成为通过 CC 数据推进 TF 功能分析的关键工具。
Pycallingcards 可以在 GitHub 存储库中访问:https://github.com/The-Mitra-Lab/pycallingcards。