Division of Environmental Health Sciences, Genes and Environment Laboratory, School of Public Health, University of California, Berkeley, California 94720, USA.
Environ Mol Mutagen. 2013 Aug;54(7):500-17. doi: 10.1002/em.21798. Epub 2013 Aug 1.
The human transcriptome is complex, comprising multiple transcript types, mostly in the form of non-coding RNA (ncRNA). The majority of ncRNA is of the long form (lncRNA, ≥ 200 bp), which plays an important role in gene regulation through multiple mechanisms including epigenetics, chromatin modification, control of transcription factor binding, and regulation of alternative splicing. Both mRNA and ncRNA exhibit additional variability in the form of alternative splicing and RNA editing. All aspects of the human transcriptome can potentially be dysregulated by environmental exposures. Next-generation RNA sequencing (RNA-Seq) is the best available methodology to measure this although it has limitations, including experimental bias. The third phase of the MicroArray Quality Control Consortium project (MAQC-III), also called Sequencing Quality Control (SeQC), aims to address these limitations through standardization of experimental and bioinformatic methodologies. A limited number of toxicogenomic studies have been conducted to date using RNA-Seq. This review describes the complexity of the human transcriptome, the application of transcriptomics by RNA-Seq or microarray in molecular epidemiology studies, and limitations of these approaches including the type of cell or tissue analyzed, experimental variation, and confounding. By using good study designs with precise, individual exposure measurements, sufficient power and incorporation of phenotypic anchors, studies in human populations can identify biomarkers of exposure and/or early effect and elucidate mechanisms of action underlying associated diseases, even at low doses. Analysis of datasets at the pathway level can compensate for some of the limitations of RNA-Seq and, as more datasets become available, will increasingly elucidate the exposure-disease continuum.
人类转录组非常复杂,包含多种转录本类型,主要以非编码 RNA(ncRNA)的形式存在。大多数 ncRNA 是长链形式(lncRNA,≥200bp),通过多种机制在基因调控中发挥重要作用,包括表观遗传学、染色质修饰、转录因子结合的控制以及可变剪接的调节。mRNA 和 ncRNA 都表现出可变剪接和 RNA 编辑等额外的变异性。人类转录组的所有方面都可能受到环境暴露的影响。下一代 RNA 测序(RNA-Seq)是测量这种变异性的最佳方法,但它也存在局限性,包括实验偏差。微阵列质量控制联盟项目(MAQC-III)的第三阶段,也称为测序质量控制(SeQC),旨在通过标准化实验和生物信息学方法来解决这些局限性。迄今为止,已经进行了少量的毒基因组学研究,使用 RNA-Seq。这篇综述描述了人类转录组的复杂性,RNA-Seq 或微阵列在分子流行病学研究中的转录组学应用,以及这些方法的局限性,包括分析的细胞或组织类型、实验变异性和混杂因素。通过使用具有精确个体暴露测量的良好研究设计、足够的功率和纳入表型锚点,人类群体中的研究可以识别暴露和/或早期效应的生物标志物,并阐明相关疾病的作用机制,即使在低剂量下也是如此。在通路水平上分析数据集可以弥补 RNA-Seq 的一些局限性,并且随着更多数据集的可用,将越来越阐明暴露-疾病连续体。