School of Biological Sciences, Nanyang Technological University, Singapore.
Department of Computer Science, National University of Singapore, Singapore; Department of Pathology, National University of Singapore, Singapore.
Drug Discov Today. 2019 Jan;24(1):31-36. doi: 10.1016/j.drudis.2018.08.002. Epub 2018 Aug 4.
Reproducible and generalizable gene signatures are essential for clinical deployment, but are hard to come by. The primary issue is insufficient mitigation of confounders: ensuring that hypotheses are appropriate, test statistics and null distributions are appropriate, and so on. To further improve robustness, additional good analytical practices (GAPs) are needed, namely: leveraging existing data and knowledge; careful and systematic evaluation of gene sets, even if they overlap with known sources of confounding; and rigorous testing of inferred signatures against as many published data sets as possible. Here, using a re-examination of a breast cancer data set and 48 published signatures, we illustrate the value of adopting these GAPs.
可重现和可推广的基因特征对于临床应用至关重要,但却很难获得。主要问题是混杂因素的缓解不足:确保假设合适、检验统计量和零分布合适等。为了进一步提高稳健性,还需要额外的良好分析实践(GAPs),即:利用现有数据和知识;仔细系统地评估基因集,即使它们与已知的混杂来源重叠;以及尽可能严格地针对多个已发表的数据集测试推断出的特征。在这里,我们使用重新检查乳腺癌数据集和 48 个已发表的特征,说明了采用这些 GAPs 的价值。