Quantitative and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA.
Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston-School of Public Health, Houston, TX, 77030, USA.
BMC Res Notes. 2021 Nov 27;14(1):436. doi: 10.1186/s13104-021-05851-x.
Allelic imbalance (AI) is the differential expression of the two alleles in a diploid. AI can vary between tissues, treatments, and environments. Methods for testing AI exist, but methods are needed to estimate type I error and power for detecting AI and difference of AI between conditions. As the costs of the technology plummet, what is more important: reads or replicates?
We find that a minimum of 2400, 480, and 240 allele specific reads divided equally among 12, 5, and 3 replicates is needed to detect a 10, 20, and 30%, respectively, deviation from allelic balance in a condition with power > 80%. A minimum of 960 and 240 allele specific reads divided equally among 8 replicates is needed to detect a 20 or 30% difference in AI between conditions with comparable power. Higher numbers of replicates increase power more than adding coverage without affecting type I error. We provide a Python package that enables simulation of AI scenarios and enables individuals to estimate type I error and power in detecting AI and differences in AI between conditions.
等位基因失衡(AI)是指在二倍体中两个等位基因的差异表达。AI 可在组织、处理和环境之间发生变化。虽然存在用于测试 AI 的方法,但需要方法来估计检测 AI 和条件之间 AI 差异的Ⅰ型错误和功效。随着技术成本的暴跌,更重要的是:读取次数还是重复次数?
我们发现,需要至少 2400、480 和 240 个等位基因特异读取数,平均分配到 12、5 和 3 个重复中,才能在具有 80%以上功效的条件下检测到偏离等位基因平衡 10%、20%和 30%的偏差。需要至少 960 和 240 个等位基因特异读取数,平均分配到 8 个重复中,才能在具有可比功效的条件下检测到 AI 差异为 20%或 30%。增加重复次数比增加覆盖范围而不影响Ⅰ型错误更能提高功效。我们提供了一个 Python 包,可用于模拟 AI 情况,并使个人能够估计检测 AI 和条件之间 AI 差异的Ⅰ型错误和功效。