Department of Biomedical Engineering, University of California, Irvine, USA.
Sci Rep. 2021 Sep 22;11(1):18866. doi: 10.1038/s41598-021-97983-0.
Studies indicate that schizophrenia has a genetic component, however it cannot be isolated to a single gene. We aimed to determine how well one could predict that a person will develop schizophrenia based on their germ line genetics. We compared 1129 people from the UK Biobank dataset who had a diagnosis of schizophrenia to an equal number of age matched people drawn from the general UK Biobank population. For each person, we constructed a profile consisting of numbers. Each number characterized the length of segments of chromosomes. We tested several machine learning algorithms to determine which was most effective in predicting schizophrenia and if any improvement in prediction occurs by breaking the chromosomes into smaller chunks. We found that the stacked ensemble, performed best with an area under the receiver operating characteristic curve (AUC) of 0.545 (95% CI 0.539-0.550). We noted an increase in the AUC by breaking the chromosomes into smaller chunks for analysis. Using SHAP values, we identified the X chromosome as the most important contributor to the predictive model. We conclude that germ line chromosomal scale length variation data could provide an effective genetic risk score for schizophrenia which performs better than chance.
研究表明精神分裂症具有遗传成分,但不能孤立到单个基因上。我们旨在确定根据一个人的种系遗传学,预测其是否会患上精神分裂症的准确率。我们将来自英国生物银行数据集的 1129 名被诊断为精神分裂症的人与来自普通英国生物银行人群中年龄相匹配的同等数量的人进行了比较。对于每个人,我们构建了一个由数字组成的特征描述。每个数字代表染色体片段的长度。我们测试了几种机器学习算法,以确定哪种算法最有效预测精神分裂症,以及通过将染色体分割成更小的片段是否可以提高预测效果。我们发现堆叠集成算法表现最好,其受试者工作特征曲线下的面积(AUC)为 0.545(95%CI 0.539-0.550)。我们注意到,通过将染色体分割成较小的片段进行分析,AUC 会增加。使用 SHAP 值,我们确定 X 染色体是预测模型的最重要贡献者。我们得出结论,种系染色体规模长度变化数据可以为精神分裂症提供有效的遗传风险评分,其表现优于随机预测。