Mary-Huard Tristan, Das Sarmistha, Mukhopadhyay Indranil, Robin Stéphane
Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.
Génétique Quantitative et Evolution (GQE)-Le Moulon, Universite Paris-Saclay, INRAE, CNRS, AgroParisTech, Gif-sur-Yvette 91190, France.
Bioinformatics. 2021 Dec 22;38(1):141-148. doi: 10.1093/bioinformatics/btab592.
Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as a set of P-values resulting from previous analyses, that need to be combined flexibly to explore complex hypotheses, while guaranteeing a low proportion of false discoveries.
We introduce the generic concept of composed hypothesis, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classifying the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated in simulations and on two different applications. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases.
The QCH methodology is available in the qch package hosted on CRAN. Additionally, R codes to reproduce the Einkorn example are available on the personal webpage of the first author: https://www6.inrae.fr/mia-paris/Equipes/Membres/Tristan-Mary-Huard.
Supplementary data are available at Bioinformatics online.
整合不同实验的结果以展现复杂模式或提高统计功效是数据整合的一个典型目标。统计分析的起点通常是一组先前分析得出的P值,需要灵活地将这些P值进行整合,以探索复杂的假设,同时保证错误发现的比例较低。
我们引入了复合假设的一般概念,它对应于简单假设的任意复杂组合。我们将检验复合假设的问题重新表述为一个分类任务,并表明找到拒绝复合原假设的项目归结为拟合一个混合模型,并根据项目的后验概率对其进行分类。我们表明可以有效地进行推断,并提供一个全面的分类规则来控制I型错误。该方法的性能和实用性在模拟以及两个不同的应用中得到了说明。该方法具有可扩展性,不需要任何参数调整,并在所考虑的应用案例中提供了有价值的生物学见解。
QCH方法可在CRAN上托管的qch包中获得。此外,重现一粒小麦示例的R代码可在第一作者的个人网页上获得:https://www6.inrae.fr/mia-paris/Equipes/Membres/Tristan-Mary-Huard。
补充数据可在《生物信息学》在线获取。