Momenta Pharmaceuticals, 301 Binney Street, Cambridge, Massachusetts, United States of America.
PLoS Comput Biol. 2020 Feb 14;16(2):e1007684. doi: 10.1371/journal.pcbi.1007684. eCollection 2020 Feb.
Identification of differentially expressed genes (DEGs) is well recognized to be variable across independent replications of genome-wide transcriptional studies. These are often employed to characterize disease state early in the process of discovery and prioritize novel targets aimed at addressing unmet medical need. Increasing reproducibility of biological findings from these studies could potentially positively impact the success rate of new clinical interventions. This work demonstrates that statistically sound combination of gene expression data with prior knowledge about biology in the form of large protein interaction networks can yield quantitatively more reproducible observations from studies characterizing human disease. The novel concept of Well-Associated Proteins (WAPs) introduced herein-gene products significantly associated on protein interaction networks with the differences in transcript levels between control and disease-does not require choosing a differential expression threshold and can be computed efficiently enough to enable false discovery rate estimation via permutation. Reproducibility of WAPs is shown to be on average superior to that of DEGs under easily-quantifiable conditions suggesting that they can yield a significantly more robust description of disease. Enhanced reproducibility of WAPs versus DEGs is first demonstrated with four independent data sets focused on systemic sclerosis. This finding is then validated over thousands of pairs of data sets obtained by random partitions of large studies in several other diseases. Conditions that individual data sets must satisfy to yield robust WAP scores are examined. Reproducible identification of WAPs can potentially benefit drug target selection and precision medicine studies.
鉴定差异表达基因(DEGs)在全基因组转录组研究的独立重复中是多变的,这一点已得到广泛认可。这些研究通常用于在发现过程的早期描述疾病状态,并优先选择针对未满足医疗需求的新目标。如果这些研究中的生物学发现具有更高的可重复性,那么新的临床干预措施的成功率可能会得到积极的影响。本工作表明,将基因表达数据与以大规模蛋白质相互作用网络形式呈现的生物学先验知识进行合理组合,可以从描述人类疾病的研究中得出更具定量可重复性的观察结果。本文提出的新概念——Well-Associated Proteins(WAPs)——即与转录水平差异显著相关的蛋白质相互作用网络上的基因产物,不需要选择差异表达阈值,并且可以高效计算,以便通过置换进行错误发现率估计。在易于量化的条件下,WAPs 的重现性平均优于 DEGs,表明它们可以对疾病进行更稳健的描述。我们首先用四个专注于系统性硬化症的独立数据集证明了 WAPs 与 DEGs 的重现性之间的增强。然后,在其他几种疾病的大型研究的随机分区中获得了数千对数据集,对这一发现进行了验证。检查了单个数据集必须满足哪些条件才能产生稳健的 WAP 分数。WAPs 的可重现鉴定可能会有益于药物靶点选择和精准医学研究。