Karp Natasha A, Heller Ruth, Yaacoby Shay, White Jacqueline K, Benjamini Yoav
Mouse Informatics Group, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, United Kingdom
Department of Statistics and Operations Research, School of Mathematical Sciences, Tel Aviv University, Israel.
Genetics. 2017 Feb;205(2):491-501. doi: 10.1534/genetics.116.195388. Epub 2016 Dec 7.
Biological research frequently involves the study of phenotyping data. Many of these studies focus on rare event categorical data, and functional genomics studies typically study the presence or absence of an abnormal phenotype. With the growing interest in the role of sex, there is a need to assess the phenotype for sexual dimorphism. The identification of abnormal phenotypes for downstream research is challenged by the small sample size, the rare event nature, and the multiple testing problem, as many variables are monitored simultaneously. Here, we develop a statistical pipeline to assess statistical and biological significance while managing the multiple testing problem. We propose a two-step pipeline to initially assess for a treatment effect, in our case example genotype, and then test for an interaction with sex. We compare multiple statistical methods and use simulations to investigate the control of the type-one error rate and power. To maximize the power while addressing the multiple testing issue, we implement filters to remove data sets where the hypotheses to be tested cannot achieve significance. A motivating case study utilizing a large scale high-throughput mouse phenotyping data set from the Wellcome Trust Sanger Institute Mouse Genetics Project, where the treatment is a gene ablation, demonstrates the benefits of the new pipeline on the downstream biological calls.
生物学研究经常涉及对表型数据的研究。这些研究中有许多聚焦于罕见事件分类数据,而功能基因组学研究通常研究异常表型的存在与否。随着对性别作用的兴趣日益浓厚,有必要评估表型的性别差异。由于样本量小、事件罕见以及存在多重检验问题(因为要同时监测许多变量),为下游研究识别异常表型面临挑战。在此,我们开发了一种统计流程,以在处理多重检验问题的同时评估统计和生物学意义。我们提出了一个两步流程,首先评估处理效应(在我们的案例中为基因型),然后检验与性别的相互作用。我们比较了多种统计方法,并使用模拟来研究一类错误率的控制和检验效能。为了在解决多重检验问题的同时最大化检验效能,我们实施了过滤操作,以去除待检验假设无法达到显著性的数据集。一项利用来自惠康信托桑格研究所小鼠遗传学项目的大规模高通量小鼠表型数据集的激励性案例研究表明,新流程对下游生物学分析有益,该研究中处理因素为基因敲除。