Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, USA.
Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Palo Alto, USA.
Nat Commun. 2022 Sep 7;13(1):5271. doi: 10.1038/s41467-022-33045-x.
A major informatic challenge in single cell RNA-sequencing analysis is the precise annotation of datasets where cells exhibit complex multilayered identities or transitory states. Here, we present devCellPy a highly accurate and precise machine learning-enabled tool that enables automated prediction of cell types across complex annotation hierarchies. To demonstrate the power of devCellPy, we construct a murine cardiac developmental atlas from published datasets encompassing 104,199 cells from E6.5-E16.5 and train devCellPy to generate a cardiac prediction algorithm. Using this algorithm, we observe a high prediction accuracy (>90%) across multiple layers of annotation and across de novo murine developmental data. Furthermore, we conduct a cross-species prediction of cardiomyocyte subtypes from in vitro-derived human induced pluripotent stem cells and unexpectedly uncover a predominance of left ventricular (LV) identity that we confirmed by an LV-specific TBX5 lineage tracing system. Together, our results show devCellPy to be a useful tool for automated cell prediction across complex cellular hierarchies, species, and experimental systems.
单细胞 RNA 测序分析中的一个主要信息学挑战是如何精确注释数据集,这些数据集的细胞表现出复杂的多层次身份或短暂状态。在这里,我们提出了 devCellPy,这是一个高度准确和精确的机器学习工具,可实现跨复杂注释层次结构的自动细胞类型预测。为了展示 devCellPy 的强大功能,我们从涵盖 E6.5-E16.5 期间的 104199 个细胞的已发表数据集中构建了一个鼠心脏发育图谱,并训练 devCellPy 生成心脏预测算法。使用该算法,我们在多个注释层和从头开始的鼠发育数据中观察到了高预测准确性(>90%)。此外,我们对体外衍生的人类诱导多能干细胞中的心肌细胞亚型进行了跨物种预测,并出人意料地发现了左心室(LV)身份的优势,我们通过 LV 特异性 TBX5 谱系追踪系统对此进行了确认。总之,我们的结果表明 devCellPy 是一种在复杂细胞层次结构、物种和实验系统中进行自动细胞预测的有用工具。