McCall Matthew N, Illei Peter B, Halushka Marc K
Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester NY, 14642, USA.
Department of Pathology, School of Medicine, Johns Hopkins University, Baltimore MD, 21287, USA.
Am J Hum Genet. 2016 Sep 1;99(3):624-635. doi: 10.1016/j.ajhg.2016.07.007.
The sources of gene expression variability in human tissues are thought to be a complex interplay of technical, compositional, and disease-related factors. To better understand these contributions, we investigated expression variability in a relatively homogeneous tissue expression dataset from the Genotype-Tissue Expression (GTEx) resource. In addition to identifying technical sources, such as sequencing date and post-mortem interval, we also identified several biological sources of variation. An in-depth analysis of the 175 genes with the greatest variation among 133 lung tissue samples identified five distinct clusters of highly correlated genes. One large cluster included surfactant genes (SFTPA1, SFTPA2, and SFTPC), which are expressed exclusively in type II pneumocytes, cells that proliferate in ventilator associated lung injury. High surfactant expression was strongly associated with death on a ventilator and type II pneumocyte hyperplasia. A second large cluster included dynein (DNAH9 and DNAH12) and mucin (MUC5B and MUC16) genes, which are exclusive to the respiratory epithelium and goblet cells of bronchial structures. This indicates heterogeneous bronchiole sampling due to the harvesting location in the lung. A small cluster included acute-phase reactant genes (SAA1, SAA2, and SAA2-SAA4). The final two small clusters were technical and gender related. To summarize, in a collection of normal lung samples, we found that tissue heterogeneity caused by harvesting location (medial or lateral lung) and late therapeutic intervention (mechanical ventilation) were major contributors to expression variation. These unexpected sources of variation were the result of altered cell ratios in the tissue samples, an underappreciated source of expression variation.
人类组织中基因表达变异性的来源被认为是技术、组成和疾病相关因素之间复杂的相互作用。为了更好地理解这些因素的作用,我们在来自基因型-组织表达(GTEx)资源的相对同质的组织表达数据集中研究了表达变异性。除了识别技术来源,如测序日期和死后间隔,我们还确定了几个生物学变异来源。对133个肺组织样本中变异最大的175个基因进行的深入分析确定了五个高度相关基因的不同簇。一个大簇包括表面活性剂基因(SFTPA1、SFTPA2和SFTPC),这些基因仅在II型肺细胞中表达,II型肺细胞在呼吸机相关性肺损伤中会增殖。高表面活性剂表达与使用呼吸机时的死亡以及II型肺细胞增生密切相关。第二个大簇包括动力蛋白(DNAH9和DNAH12)和粘蛋白(MUC5B和MUC16)基因,这些基因是支气管结构的呼吸上皮和杯状细胞所特有的。这表明由于在肺中的采集位置导致细支气管采样存在异质性。一个小簇包括急性期反应基因(SAA1、SAA2和SAA2-SAA4)。最后两个小簇与技术和性别有关。总之,在一组正常肺样本中,我们发现由采集位置(肺内侧或外侧)和晚期治疗干预(机械通气)引起的组织异质性是表达变异的主要原因。这些意外的变异来源是组织样本中细胞比例改变的结果,而细胞比例改变是一个未被充分认识的表达变异来源。