处理组学分析中的混杂因素。

Dealing with Confounders in Omics Analysis.

机构信息

School of Biological Sciences, Nanyang Technological University, 637551, Singapore.

Department of Computer Science, National University of Singapore, 117417, Singapore; Department of Pathology, National University of Singapore, 119074, Singapore.

出版信息

Trends Biotechnol. 2018 May;36(5):488-498. doi: 10.1016/j.tibtech.2018.01.013. Epub 2018 Feb 20.

DOI:10.1016/j.tibtech.2018.01.013

PMID:29475622

Abstract

The Anna Karenina effect is a manifestation of the theory-practice gap that exists when theoretical statistics are applied on real-world data. In the course of analyzing biological data for differential features such as genes or proteins, it derives from the situation where the null hypothesis is rejected for extraneous reasons (or confounders), rather than because the alternative hypothesis is relevant to the disease phenotype. The mechanics of applying statistical tests therefore must address and resolve confounders. It is inadequate to simply rely on manipulating the P-value. We discuss three mechanistic elements (hypothesis statement construction, null distribution appropriateness, and test-statistic construction) and suggest how they can be designed to foil the Anna Karenina effect to select phenotypically relevant biological features.

摘要

安娜·卡列尼娜效应是理论统计应用于实际数据时存在的理论-实践差距的表现。在分析生物学数据的差异特征（如基因或蛋白质）时，它源于由于无关原因（或混杂因素）而拒绝零假设的情况，而不是因为替代假设与疾病表型相关。因此，应用统计检验的机制必须解决和解决混杂因素。仅仅依靠操纵 P 值是不够的。我们讨论了三个机械元素（假设陈述构建、零分布适当性和检验统计构建），并提出了如何设计它们以挫败安娜·卡列尼娜效应以选择表型相关的生物特征。