Jia Gaoxiang, Wang Xinlei, Li Qiwei, Lu Wei, Tang Ximing, Wistuba Ignacio, Xie Yang
Department of Statistical Science, Southern Methodist University, 3225 Daniel Avenue, P O Box 750332, Dallas, Texas 75275.
Quantitative Biomedical Research Center, Department of Clinical Sciences, The University of Texas Southwestern Medical Center, Dallas, Texas 75390.
Ann Appl Stat. 2019 Sep;13(3):1617-1647. doi: 10.1214/19-aoas1249. Epub 2019 Oct 17.
Formalin-fixed paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies, and diagnosis or prognosis of diseases. Their application, however, is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expression with high sensitivity, which may greatly facilitate realization of scientific and clinical values of FFPE samples. However, methodological development for normalization, a critical step when analyzing this type of data, is far behind. Existing methods designed for the platform use information from different types of internal controls separately and rely on an overly-simplified assumption that expression of housekeeping genes is constant across samples for global scaling. Thus, these methods are not optimized for the nCounter system, not mentioning that they were not developed for FFPE samples. We construct an integrated system of random-coefficient hierarchical regression models to capture main patterns and characteristics observed from NanoString data of FFPE samples, and develop a Bayesian approach to estimate parameters and normalize gene expression across samples. Our method, labeled RCRnorm, incorporates information from all aspects of the experimental design and simultaneously removes biases from various sources. It eliminates the unrealistic assumption on housekeeping genes and offers great interpretability. Furthermore, it is applicable to freshly frozen or like samples that can be generally viewed as a reduced case of FFPE samples. Simulation and applications showed the superior performance of RCRnorm.
福尔马林固定石蜡包埋(FFPE)样本在生物标志物发现、回顾性研究以及疾病诊断或预后方面具有巨大潜力。然而,传统基因表达谱技术在受损RNA上的表现不尽人意,阻碍了其应用。NanoString nCounter平台非常适合对FFPE样本进行分析,能够高灵敏度地检测基因表达,这可能极大地促进FFPE样本科学和临床价值的实现。然而,作为分析此类数据关键步骤的标准化方法的发展却远远滞后。针对该平台设计的现有方法分别使用来自不同类型内部对照的信息,并依赖一个过于简化的假设,即管家基因的表达在所有样本中是恒定的,用于全局缩放。因此,这些方法并未针对nCounter系统进行优化,更不用说它们并非为FFPE样本而开发。我们构建了一个随机系数分层回归模型的集成系统,以捕捉从FFPE样本的NanoString数据中观察到的主要模式和特征,并开发了一种贝叶斯方法来估计参数并对样本间的基因表达进行标准化。我们的方法名为RCRnorm,整合了实验设计各方面的信息,同时消除了来自各种来源的偏差。它摒弃了对管家基因的不切实际假设,并具有很强的可解释性。此外,它适用于新鲜冷冻样本或类似样本,这些样本通常可视为FFPE样本的简化情况。模拟和应用结果显示了RCRnorm的卓越性能。