家族样本中多分类表型的遗传关联检验。
Genetic association tests in family samples for multi-category phenotypes.
机构信息
Pfizer Inc, Global Product Development, Groton, CT, 06340, USA.
Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA.
出版信息
BMC Genomics. 2021 Dec 4;22(1):873. doi: 10.1186/s12864-021-08107-x.
BACKGROUND
Advancements in statistical methods and sequencing technology have led to numerous novel discoveries in human genetics in the past two decades. Among phenotypes of interest, most attention has been given to studying genetic associations with continuous or binary traits. Efficient statistical methods have been proposed and are available for both types of traits under different study designs. However, for multinomial categorical traits in related samples, there is a lack of efficient statistical methods and software.
RESULTS
We propose an efficient score test to analyze a multinomial trait in family samples, in the context of genome-wide association/sequencing studies. An alternative Wald statistic is also proposed. We also extend the methodology to be applicable to ordinal traits. We performed extensive simulation studies to evaluate the type-I error of the score test, Wald test compared to the multinomial logistic regression for unrelated samples, under different allele frequency and study designs. We also evaluate the power of these methods. Results show that both the score and Wald tests have a well-controlled type-I error rate, but the multinomial logistic regression has an inflated type-I error rate when applied to family samples. We illustrated the application of the score test with an application to the Framingham Heart Study to uncover genetic variants associated with diabesity, a multi-category phenotype.
CONCLUSION
Both proposed tests have correct type-I error rate and similar power. However, because the Wald statistics rely on computer-intensive estimation, it is less efficient than the score test in terms of applications to large-scale genetic association studies. We provide computer implementation for both multinomial and ordinal traits.
背景
在过去的二十年中,统计方法和测序技术的进步推动了人类遗传学的许多新发现。在感兴趣的表型中,大多数注意力都集中在研究与连续或二分类特征的遗传关联上。已经提出了有效的统计方法,可用于不同研究设计下这两种类型的特征。然而,对于相关样本中的多项分类特征,缺乏有效的统计方法和软件。
结果
我们提出了一种有效的评分检验方法,用于分析家系样本中的多项特征,这是在全基因组关联/测序研究的背景下进行的。还提出了另一种替代的 Wald 统计量。我们还将方法扩展到适用于有序特征。我们进行了广泛的模拟研究,以评估评分检验、Wald 检验与针对无关样本的多项逻辑回归的Ⅰ型错误率,在不同的等位基因频率和研究设计下。我们还评估了这些方法的功效。结果表明,评分检验和 Wald 检验的Ⅰ型错误率都得到了很好的控制,但多项逻辑回归应用于家系样本时会导致Ⅰ型错误率膨胀。我们用 Framingham 心脏研究的一个应用来说明评分检验的应用,以揭示与糖尿病肥胖症相关的遗传变异,这是一个多分类表型。
结论
两种拟议的检验都具有正确的Ⅰ型错误率和相似的功效。然而,由于 Wald 统计量依赖于计算机密集型估计,因此在应用于大规模遗传关联研究时,其效率不如评分检验高。我们为多项和有序特征提供了计算机实现。