Telethon Kids Institute, The University of Western Australia, PO Box 855, West Perth, WA, 6872, Australia.
Office of Population Health Genomics, Department of Health, PO Box 8172, Perth Business Centre, Perth, WA, 6849, Australia.
Nat Commun. 2019 Nov 21;10(1):5274. doi: 10.1038/s41467-019-13345-5.
Whole genome and exome sequencing is a standard tool for the diagnosis of patients suffering from rare and other genetic disorders. The interpretation of the tens of thousands of variants returned from such tests remains a major challenge. Here we focus on the problem of prioritising variants with respect to the observed disease phenotype. We hypothesise that linking patterns of gene expression across multiple tissues to the phenotypes will aid in discovering disease causing variants. To test this, we construct classifiers that learn associations between tissue-specific gene expression and disease phenotypes. We find that using Genotype-Tissue Expression project (GTEx) expression data in conjunction with disease agnostic variant prioritisation methods (CADD or MetaSVM) results in consistent improvements in classification accuracy. Our method represents a previously overlooked avenue of utilising existing expression data for clinical diagnostics, and also opens the door to use of other functional genomic data sets in the same manner.
全基因组和外显子组测序是诊断患有罕见病和其他遗传疾病患者的标准工具。从这类测试中返回的数万个变体的解释仍然是一个主要挑战。在这里,我们专注于根据观察到的疾病表型对变体进行优先级排序的问题。我们假设将跨多种组织的基因表达模式与表型联系起来将有助于发现导致疾病的变体。为了验证这一点,我们构建了分类器,这些分类器可以学习组织特异性基因表达与疾病表型之间的关联。我们发现,结合使用基因型组织表达项目(GTEx)表达数据和与疾病无关的变体优先级排序方法(CADD 或 MetaSVM),可显著提高分类准确性。我们的方法代表了一种以前被忽视的利用现有表达数据进行临床诊断的途径,同时也为以相同方式使用其他功能基因组数据集开辟了道路。