Department Anatomy & Neurosciences, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Netherlands.
Department of Biomedical Sciences of Cells & Systems, section Molecular Neurobiology, University of Groningen, University Medical Center Groningen, Groningen, Netherlands.
Glia. 2021 Dec;69(12):2933-2946. doi: 10.1002/glia.24078. Epub 2021 Aug 18.
The advent of RNA-sequencing techniques has made it possible to generate large, unbiased gene expression datasets of tissues and cell types. Several studies describing gene expression data of microglia from Alzheimer's disease or multiple sclerosis have been published, aiming to generate more insight into the role of microglia in these neurological diseases. Though the raw sequencing data are often deposited in open access databases, the most accessible source of data for scientists is what is reported in published manuscripts. We observed a relatively limited overlap in reported differentially expressed genes between various microglia RNA-sequencing studies from multiple sclerosis or Alzheimer's diseases. It was clear that differences in experimental set up influenced the number of overlapping reported genes. However, even when the experimental set up was very similar, we observed that overlap in reported genes could be low. We identified that papers reporting large numbers of differentially expressed microglial genes generally showed higher overlap with other papers. In addition, though the pathology present within the tissue used for sequencing can greatly influence microglia gene expression, often the pathology present in samples used for sequencing was underreported, leaving it difficult to assess the data. Whereas reanalyzing every raw dataset could reduce the variation that contributes to the observed limited overlap in reported genes, this is not feasible for labs without (access to) bioinformatic expertise. In this study, we thus provide an overview of data present in manuscripts and their supplementary files and how these data can be interpreted.
RNA 测序技术的出现使得对组织和细胞类型的大规模、无偏基因表达数据集进行生成成为可能。已经发表了几项描述阿尔茨海默病或多发性硬化症中小胶质细胞基因表达数据的研究,旨在更深入地了解小胶质细胞在这些神经疾病中的作用。虽然原始测序数据通常被存储在开放获取数据库中,但科学家最容易获得的数据来源是已发表的手稿中报告的数据。我们观察到,来自多发性硬化症或阿尔茨海默病的各种小胶质细胞 RNA 测序研究中报告的差异表达基因之间的重叠相对有限。实验设置的差异显然会影响报告基因的数量。然而,即使实验设置非常相似,我们也观察到报告基因的重叠可能很低。我们发现,报告大量差异表达的小胶质细胞基因的论文通常与其他论文有更高的重叠。此外,尽管用于测序的组织中的病理变化会极大地影响小胶质细胞的基因表达,但测序样本中存在的病理变化往往被低估,这使得难以评估数据。虽然重新分析每个原始数据集可以减少导致报告基因中观察到的有限重叠的变化,但对于没有(访问)生物信息学专业知识的实验室来说,这是不可行的。在这项研究中,我们因此提供了对论文及其补充文件中存在的数据的概述,以及如何解释这些数据。