Tarozzi M, Bartoletti-Stella A, Dall'Olio D, Matteuzzi T, Baiardi S, Parchi P, Castellani G, Capellari S
Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy.
Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy.
BMC Med Genomics. 2022 Feb 10;15(1):26. doi: 10.1186/s12920-022-01173-4.
Targeted Next Generation Sequencing is a common and powerful approach used in both clinical and research settings. However, at present, a large fraction of the acquired genetic information is not used since pathogenicity cannot be assessed for most variants. Further complicating this scenario is the increasingly frequent description of a poli/oligogenic pattern of inheritance showing the contribution of multiple variants in increasing disease risk. We present an approach in which the entire genetic information provided by target sequencing is transformed into binary data on which we performed statistical, machine learning, and network analyses to extract all valuable information from the entire genetic profile. To test this approach and unbiasedly explore the presence of recurrent genetic patterns, we studied a cohort of 112 patients affected either by genetic Creutzfeldt-Jakob (CJD) disease caused by two mutations in the PRNP gene (p.E200K and p.V210I) with different penetrance or by sporadic Alzheimer disease (sAD).
Unsupervised methods can identify functionally relevant sources of variation in the data, like haplogroups and polymorphisms that do not follow Hardy-Weinberg equilibrium, such as the NOTCH3 rs11670823 (c.3837 + 21 T > A). Supervised classifiers can recognize clinical phenotypes with high accuracy based on the mutational profile of patients. In addition, we found a similar alteration of allele frequencies compared the European population in sporadic patients and in V210I-CJD, a poorly penetrant PRNP mutation, and sAD, suggesting shared oligogenic patterns in different types of dementia. Pathway enrichment and protein-protein interaction network revealed different altered pathways between the two PRNP mutations.
We propose this workflow as a possible approach to gain deeper insights into the genetic information derived from target sequencing, to identify recurrent genetic patterns and improve the understanding of complex diseases. This work could also represent a possible starting point of a predictive tool for personalized medicine and advanced diagnostic applications.
靶向新一代测序是临床和研究中常用的强大方法。然而,目前由于大多数变异的致病性无法评估,大量获取的遗传信息未被利用。使这种情况更加复杂的是,越来越频繁地描述了一种多基因/寡基因遗传模式,显示多个变异对疾病风险增加的贡献。我们提出了一种方法,将靶向测序提供的全部遗传信息转化为二进制数据,并在其上进行统计、机器学习和网络分析,以从整个遗传谱中提取所有有价值的信息。为了测试这种方法并无偏地探索复发性遗传模式的存在,我们研究了一组112名患者,这些患者要么患有由PRNP基因中的两个具有不同外显率的突变(p.E200K和p.V210I)引起的遗传性克雅氏病(CJD),要么患有散发性阿尔茨海默病(sAD)。
无监督方法可以识别数据中功能相关的变异来源,如单倍群和不遵循哈迪-温伯格平衡的多态性,如NOTCH3 rs11670823(c.3837 + 21 T > A)。有监督分类器可以根据患者的突变谱高精度地识别临床表型。此外,我们发现散发性患者、低外显率PRNP突变V210I-CJD患者和sAD患者的等位基因频率与欧洲人群相比有类似变化,表明不同类型痴呆中存在共同的寡基因模式。通路富集和蛋白质-蛋白质相互作用网络揭示了两个PRNP突变之间不同的改变通路。
我们提出这种工作流程作为一种可能的方法,以更深入地了解从靶向测序获得的遗传信息,识别复发性遗传模式并增进对复杂疾病的理解。这项工作也可能代表个性化医学和先进诊断应用预测工具的一个可能起点。