Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal.
Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal.
J Integr Bioinform. 2024 Jul 15;21(2). doi: 10.1515/jib-2023-0042. eCollection 2024 Jun 1.
This study delves into the intricate genetic and clinical aspects of Schizophrenia, a complex mental disorder with uncertain etiology. Deep Learning (DL) holds promise for analyzing large genomic datasets to uncover new risk factors. However, based on reports of non-negligible misdiagnosis rates for SCZ, case-control cohorts may contain outlying genetic profiles, hindering compelling performances of classification models. The research employed a case-control dataset sourced from the Swedish populace. A gene-annotation-based DL architecture was developed and employed in two stages. First, the model was trained on the entire dataset to highlight differences between cases and controls. Then, samples likely to be misclassified were excluded, and the model was retrained on the refined dataset for performance evaluation. The results indicate that SCZ prevalence and misdiagnosis rates can affect case-control cohorts, potentially compromising future studies reliant on such datasets. However, by detecting and filtering outliers, the study demonstrates the feasibility of adapting DL methodologies to large-scale biological problems, producing results more aligned with existing heritability estimates for SCZ. This approach not only advances the comprehension of the genetic background of SCZ but also opens doors for adapting DL techniques in complex research for precision medicine in mental health.
本研究深入探讨了精神分裂症的复杂遗传和临床方面,精神分裂症是一种病因不明的复杂精神障碍。深度学习(DL)有希望分析大型基因组数据集以发现新的风险因素。然而,基于 SCZ 误诊率不可忽视的报告,病例对照队列可能包含异常的遗传特征,从而阻碍分类模型的出色表现。该研究采用了源自瑞典人群的病例对照数据集。开发并使用了基于基因注释的 DL 架构分两个阶段进行。首先,该模型在整个数据集上进行训练,以突出病例和对照组之间的差异。然后,排除可能被错误分类的样本,并在经过精炼的数据集上重新训练模型以进行性能评估。结果表明,SCZ 的患病率和误诊率可能会影响病例对照队列,从而可能影响未来依赖此类数据集的研究。但是,通过检测和过滤异常值,该研究证明了将 DL 方法应用于大规模生物学问题的可行性,从而产生更符合现有 SCZ 遗传率估计的结果。这种方法不仅推进了对 SCZ 遗传背景的理解,而且为在精神卫生领域的精准医学中应用 DL 技术进行复杂研究开辟了道路。