Epigenetics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.
South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales, Liverpool, NSW, Australia.
Bioinformatics. 2019 Feb 15;35(4):560-570. doi: 10.1093/bioinformatics/bty675.
A synoptic view of the human genome benefits chiefly from the application of nucleic acid sequencing and microarray technologies. These platforms allow interrogation of patterns such as gene expression and DNA methylation at the vast majority of canonical loci, allowing granular insights and opportunities for validation of original findings. However, problems arise when validating against a "gold standard" measurement, since this immediately biases all subsequent measurements towards that particular technology or protocol. Since all genomic measurements are estimates, in the absence of a "gold standard" we instead empirically assess the measurement precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row-linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. Both cross-platform and cross-locus comparisons can be made across all common loci, allowing identification of technology- and locus-specific tendencies.
We assess technologies including the Infinium MethylationEPIC BeadChip, whole genome bisulfite sequencing (WGBS), two different RNA-Seq protocols (PolyA+ and Ribo-Zero) and five different gene expression array platforms. Each technology thus is characterised herein, relative to the consensus. We showcase a number of applications of the row-linear model, including correlation with known interfering traits. We demonstrate a clear effect of cross-hybridisation on the sensitivity of Infinium methylation arrays. Additionally, we perform a true interlaboratory test on a set of samples interrogated on the same platform across twenty-one separate testing laboratories.
A full implementation of the row-linear model, plus extra functions for visualisation, are found in the R package consensus at https://github.com/timpeters82/consensus.
Supplementary data are available at Bioinformatics online.
人类基因组的综合视图主要受益于核酸测序和微阵列技术的应用。这些平台允许在绝大多数规范基因座上检测基因表达和 DNA 甲基化等模式,从而提供细微的见解和验证原始发现的机会。然而,当与“金标准”测量值进行验证时,会出现问题,因为这会立即使所有后续测量值偏向于该特定技术或协议。由于所有基因组测量值都是估计值,因此在没有“金标准”的情况下,我们通过一种称为行线性模型的共识建模方法来经验性地评估大量基因组技术的测量精度和灵敏度。这种方法是美国测试材料协会标准 E691 的应用,用于评估跨多个测试站点的实验室间精度和变异性来源。可以在所有常见基因座上进行跨平台和跨基因座的比较,从而确定技术和基因座特异性的趋势。
我们评估了包括 Infinium MethylationEPIC BeadChip、全基因组亚硫酸氢盐测序 (WGBS)、两种不同的 RNA-Seq 方案 (PolyA+和 Ribo-Zero) 和五种不同的基因表达阵列平台在内的技术。因此,每种技术都相对于共识进行了描述。我们展示了行线性模型的一些应用,包括与已知干扰特征的相关性。我们清楚地表明了 Infinium 甲基化阵列的交叉杂交对灵敏度的影响。此外,我们在二十一个独立的测试实验室对同一平台上检测的一组样本进行了真正的实验室间测试。
行线性模型的完整实现以及用于可视化的额外功能可在 https://github.com/timpeters82/consensus 上的 R 包 consensus 中找到。
补充数据可在生物信息学在线获得。