Li Jingyi Jessica, Tong Xin
Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA.
Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA 90089, USA.
Patterns (N Y). 2020 Oct 9;1(7):100115. doi: 10.1016/j.patter.2020.100115.
Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here, we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example.
做出二元决策是科学研究和工业应用中常见的数据分析任务。在数据科学领域,有两种相关但不同的策略:假设检验和二元分类。在实际应用中,如何在这两种策略之间进行选择可能并不明确,甚至相当令人困惑。在此,我们从三个方面总结了这两种策略的关键区别,并为数据分析师列出了五条实用指南,以便根据具体分析需求选择合适的策略。我们在一个癌症驱动基因预测示例中展示了这些指南的应用。