Allen Institute for Brain Science, Seattle, Washington, 98103, USA.
Department of Clinical Sciences, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA.
BMC Bioinformatics. 2017 Dec 21;18(Suppl 17):559. doi: 10.1186/s12859-017-1977-1.
A fundamental characteristic of multicellular organisms is the specialization of functional cell types through the process of differentiation. These specialized cell types not only characterize the normal functioning of different organs and tissues, they can also be used as cellular biomarkers of a variety of different disease states and therapeutic/vaccine responses. In order to serve as a reference for cell type representation, the Cell Ontology has been developed to provide a standard nomenclature of defined cell types for comparative analysis and biomarker discovery. Historically, these cell types have been defined based on unique cellular shapes and structures, anatomic locations, and marker protein expression. However, we are now experiencing a revolution in cellular characterization resulting from the application of new high-throughput, high-content cytometry and sequencing technologies. The resulting explosion in the number of distinct cell types being identified is challenging the current paradigm for cell type definition in the Cell Ontology.
In this paper, we provide examples of state-of-the-art cellular biomarker characterization using high-content cytometry and single cell RNA sequencing, and present strategies for standardized cell type representations based on the data outputs from these cutting-edge technologies, including "context annotations" in the form of standardized experiment metadata about the specimen source analyzed and marker genes that serve as the most useful features in machine learning-based cell type classification models. We also propose a statistical strategy for comparing new experiment data to these standardized cell type representations.
The advent of high-throughput/high-content single cell technologies is leading to an explosion in the number of distinct cell types being identified. It will be critical for the bioinformatics community to develop and adopt data standard conventions that will be compatible with these new technologies and support the data representation needs of the research community. The proposals enumerated here will serve as a useful starting point to address these challenges.
多细胞生物的一个基本特征是通过分化过程来专门化功能细胞类型。这些特化的细胞类型不仅是不同器官和组织正常功能的特征,也可以作为各种不同疾病状态和治疗/疫苗反应的细胞生物标志物。为了作为细胞类型表示的参考,细胞本体论已经被开发出来,为比较分析和生物标志物发现提供了定义细胞类型的标准命名法。从历史上看,这些细胞类型是根据独特的细胞形状和结构、解剖位置和标记蛋白表达来定义的。然而,我们现在正经历着细胞特征描述的革命,这是由于应用了新的高通量、高内涵细胞术和测序技术。被识别的不同细胞类型数量的爆炸式增长,正在挑战细胞本体论中当前的细胞类型定义范式。
在本文中,我们提供了使用高内涵细胞术和单细胞 RNA 测序进行最新细胞生物标志物特征描述的示例,并提出了基于这些前沿技术的数据输出进行标准化细胞类型表示的策略,包括分析标本来源的标准化实验元数据形式的“上下文注释”,以及作为基于机器学习的细胞类型分类模型中最有用特征的标记基因。我们还提出了一种将新实验数据与这些标准化细胞类型表示进行比较的统计策略。
高通量/高内涵单细胞技术的出现正在导致被识别的不同细胞类型数量的爆炸式增长。对于生物信息学社区来说,开发和采用与这些新技术兼容并支持研究社区数据表示需求的数据标准约定将是至关重要的。这里列举的建议将是解决这些挑战的一个有用起点。