Gebski Val, Silva S Sandun M, Byth Karen, Jenkins Alicia, Keech Anthony
NHMRC Clinical Trials Centre, University of Sydney, Camperdown, NSW 1450, Australia.
Bioinform Adv. 2023 Oct 13;3(1):vbad148. doi: 10.1093/bioadv/vbad148. eCollection 2023.
Technologies identifying single nucleotide polymorphisms () in DNA sequencing yield an avalanche of data requiring analysis and interpretation. Standard methods may require many weeks of processing time. The use of statistical methods requiring data sorting, matrix inversions of a high-dimension and replication in subsets of the data on multiple outcomes exacerbate these times.A method which reduces the computational time in problems with time-to-event outcomes and hundreds of thousands/millions of using Cox-Snell residuals after fitting the Cox proportional hazards model () to a fixed set of concomitant variables is proposed. This yields coefficients for SNP effect from a Cox-Snell adjusted Poisson model and shows a high concordance to the adjusted model.The method is illustrated with a sample of 10 000 from a genome-wide association study in a diabetic population. The gain in processing efficiency using the proposed method based on Poisson modelling can be as high as 62%. This could result in saving of over three weeks processing time if 5 million require analysis. The method involves only a single predictor variable (SNP), offering a simpler, computationally more stable approach to examining and identifying SNP patterns associated with the outcome(s) allowing for a faster development of genetic signatures. Use of deviance residuals from the model to screen demonstrates a large discordance rate at a 0.2% threshold of concordance. This rate is 15 times larger than that based on the Cox-Snell residuals from the Cox-Snell adjusted Poisson model.
The method is simple to implement as the procedures are available in most statistical packges. The approach involves obtaining Cox-Snell residuals from a model, to a binary time-to-event outcome, for factors which need to be common when assessing each Each is then fitted as a predictor to the outcome of interest using a Poisson model with the Cox-Snell as the exposure variable.
DNA测序中识别单核苷酸多态性(SNP)的技术产生了大量需要分析和解读的数据。标准方法可能需要数周的处理时间。使用需要数据排序、高维矩阵求逆以及对多个结果的数据子集进行重复分析的统计方法会进一步延长这些时间。本文提出一种方法,在将Cox比例风险模型(CPHM)拟合到一组固定的伴随变量后,利用Cox - Snell残差减少具有事件发生时间结局和数十万/数百万个SNP问题的计算时间。这会从Cox - Snell调整后的泊松模型中得出SNP效应的系数,并显示出与调整后的CPHM模型高度一致。
该方法通过对糖尿病群体全基因组关联研究中的10000个SNP样本进行说明。使用基于泊松建模的所提出方法,处理效率的提升高达62%。如果需要分析500万个SNP,这可能会节省超过三周的处理时间。该方法仅涉及单个预测变量(SNP),为检查和识别与结局相关的SNP模式提供了一种更简单、计算上更稳定的方法,从而能够更快地开发遗传特征。使用CPHM模型的偏差残差筛选SNP,在一致性阈值为0.2%时显示出较大的不一致率。该比率比基于Cox - Snell调整后的泊松模型的Cox - Snell残差的比率大15倍。
该方法易于实现,因为其程序在大多数统计软件包中都可用。该方法包括从CPHM模型中获取针对二元事件发生时间结局的Cox - Snell残差,用于评估每个SNP时需要共同考虑的因素。然后,使用以Cox - Snell为暴露变量的泊松模型,将每个SNP作为预测变量拟合到感兴趣的结局。