Graduate School of Biomedical Sciences.
Icahn Institute for Genomics and Multiscale Biology.
Bioinformatics. 2019 Oct 15;35(20):3906-3912. doi: 10.1093/bioinformatics/btz202.
Non-coding rare variants (RVs) may contribute to Mendelian disorders but have been challenging to study due to small sample sizes, genetic heterogeneity and uncertainty about relevant non-coding features. Previous studies identified RVs associated with expression outliers, but varying outlier definitions were employed and no comprehensive open-source software was developed.
We developed Outlier-RV Enrichment (ORE) to identify biologically-meaningful non-coding RVs. We implemented ORE combining whole-genome sequencing and cardiac RNAseq from congenital heart defect patients from the Pediatric Cardiac Genomics Consortium and deceased adults from Genotype-Tissue Expression. Use of rank-based outliers maximized sensitivity while a most extreme outlier approach maximized specificity. Rarer variants had stronger associations, suggesting they are under negative selective pressure and providing a basis for investigating their contribution to Mendelian disorders.
ORE, source code, and documentation are available at https://pypi.python.org/pypi/ore under the MIT license.
Supplementary data are available at Bioinformatics online.
非编码稀有变异(RVs)可能导致孟德尔疾病,但由于样本量小、遗传异质性和对相关非编码特征的不确定性,研究起来具有挑战性。先前的研究确定了与表达异常值相关的 RVs,但采用了不同的异常值定义,并且没有开发出全面的开源软件。
我们开发了 Outlier-RV Enrichment(ORE)来识别具有生物学意义的非编码 RVs。我们结合了儿科心脏基因组学联盟的先天性心脏病患者的全基因组测序和心脏 RNAseq 以及基因型组织表达的已故成年人的数据来实现 ORE。使用基于排名的异常值最大化了敏感性,而最极端的异常值方法最大化了特异性。更罕见的变异与更强的关联,这表明它们受到负选择压力的影响,并为研究它们对孟德尔疾病的贡献提供了依据。
ORE、源代码和文档可在 MIT 许可证下在 https://pypi.python.org/pypi/ore 获得。
补充数据可在生物信息学在线获得。