Liu Jianyu, Wang Haodong, Sun Wei, Liu Yufeng
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill.
Biostatistics Program, Public Health Sciences Division Fred Hutchinson Cancer Research Center, Seattle, Washington.
J Am Stat Assoc. 2022;117(537):38-51. doi: 10.1080/01621459.2021.1933495. Epub 2021 Jul 21.
Hundreds of autism risk genes have been reported recently, mainly based on genetic studies where these risk genes have more de novo mutations in autism subjects than healthy controls. However, as a complex disease, autism is likely associated with more risk genes and many of them may not be identifiable through de novo mutations. We hypothesize that more autism risk genes can be identified through their connections with known autism risk genes in personalized gene-gene interaction graphs. We estimate such personalized graphs using single cell RNA sequencing (scRNA-seq) while appropriately modeling the cell dependence and possible zero-inflation in the scRNA-seq data. The sample size, which is the number of cells per individual, ranges from 891 to 1,241 in our case study using scRNA-seq data in autism subjects and controls. We consider 1,500 genes in our analysis. Since the number of genes is larger or comparable to the sample size, we perform penalized estimation. We score each gene's relevance by applying a simple graph kernel smoothing method to each personalized graph. The molecular functions of the top-scored genes are related to autism diseases. For example, a candidate gene RYR2 that encodes protein ryanodine receptor 2 is involved in neurotransmission, a process that is impaired in ASD patients. While our method provides a systemic and unbiased approach to prioritize autism risk genes, the relevance of these genes needs to be further validated in functional studies.
最近已报道了数百个自闭症风险基因,主要基于基因研究,在这些研究中,自闭症患者中这些风险基因的新生突变比健康对照更多。然而,作为一种复杂疾病,自闭症可能与更多风险基因相关,其中许多基因可能无法通过新生突变来识别。我们假设,通过在个性化基因-基因相互作用图中与已知自闭症风险基因的联系,可以识别出更多自闭症风险基因。我们使用单细胞RNA测序(scRNA-seq)估计此类个性化图,同时对scRNA-seq数据中的细胞依赖性和可能的零膨胀进行适当建模。在我们使用自闭症患者和对照的scRNA-seq数据的案例研究中,样本量(即每个个体的细胞数量)在891至1241之间。我们在分析中考虑了1500个基因。由于基因数量大于或与样本量相当,我们进行惩罚估计。我们通过对每个个性化图应用简单的图核平滑方法来对每个基因的相关性进行评分。得分最高的基因的分子功能与自闭症疾病相关。例如,编码ryanodine受体2蛋白的候选基因RYR2参与神经传递,而这一过程在自闭症谱系障碍(ASD)患者中受损。虽然我们的方法提供了一种系统且无偏的方法来对自闭症风险基因进行优先级排序,但这些基因的相关性仍需要在功能研究中进一步验证。