Zhang Yiding, Chen Lyujie, Li Shao
IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):819-829. doi: 10.1109/TCBB.2020.3017547. Epub 2022 Apr 1.
Inference of disease-gene associations helps unravel the pathogenesis of diseases and contributes to the treatment. Although many machine learning-based methods have been developed to predict causative genes, accurate association inference remains challenging. One major reason is the inaccurate feature selection and accumulation of error brought by commonly used multi-stage training architecture. In addition, the existing methods do not incorporate cell-type-specific information, thus fail to study gene functions at a higher resolution. Therefore, we introduce single-cell transcriptome data and construct a context-aware network to unbiasedly integrate all data sources. Then we develop a graph convolution-based approach named CIPHER-SC to realize a complete end-to-end learning architecture. Our approach outperforms four state-of-the-art approaches in five-fold cross-validations on three distinct test sets with the best AUC of 0.9501, demonstrating its stable ability either to predict the novel genes or to predict with genetic basis. The ablation study shows that our complete end-to-end design and unbiased data integration boost the performance from 0.8727 to 0.9443 in AUC. The addition of single-cell data further improves the prediction accuracy and makes our results be enriched for cell-type-specific genes. These results confirm the ability of CIPHER-SC to discover reliable disease genes. Our implementation is available at http://github.com/YidingZhang117/CIPHER-SC.
疾病-基因关联的推断有助于揭示疾病的发病机制并为治疗提供帮助。尽管已经开发了许多基于机器学习的方法来预测致病基因,但准确的关联推断仍然具有挑战性。一个主要原因是常用的多阶段训练架构带来的特征选择不准确和误差累积。此外,现有方法没有纳入细胞类型特异性信息,因此无法在更高分辨率下研究基因功能。因此,我们引入单细胞转录组数据并构建一个上下文感知网络,以无偏地整合所有数据源。然后我们开发了一种基于图卷积的方法,名为CIPHER-SC,以实现完整的端到端学习架构。在三个不同测试集的五折交叉验证中,我们的方法优于四种最先进的方法,最佳AUC为0.9501,证明了其预测新基因或基于遗传基础进行预测的稳定能力。消融研究表明,我们完整的端到端设计和无偏数据整合将AUC的性能从0.8727提高到0.9443。单细胞数据的加入进一步提高了预测准确性,并使我们的结果富集了细胞类型特异性基因。这些结果证实了CIPHER-SC发现可靠疾病基因的能力。我们的实现可在http://github.com/YidingZhang117/CIPHER-SC上获取。