Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China.
School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
BMC Bioinformatics. 2021 Mar 24;22(1):152. doi: 10.1186/s12859-021-04007-9.
Recent studies have confirmed that N7-methylguanosine (mG) modification plays an important role in regulating various biological processes and has associations with multiple diseases. Wet-lab experiments are cost and time ineffective for the identification of disease-associated mG sites. To date, tens of thousands of mG sites have been identified by high-throughput sequencing approaches and the information is publicly available in bioinformatics databases, which can be leveraged to predict potential disease-associated mG sites using a computational perspective. Thus, computational methods for mG-disease association prediction are urgently needed, but none are currently available at present.
To fill this gap, we collected association information between mG sites and diseases, genomic information of mG sites, and phenotypic information of diseases from different databases to build an mG-disease association dataset. To infer potential disease-associated mG sites, we then proposed a heterogeneous network-based model, mG Sites and Diseases Associations Inference (mGDisAI) model. mGDisAI predicts the potential disease-associated mG sites by applying a matrix decomposition method on heterogeneous networks which integrate comprehensive similarity information of mG sites and diseases. To evaluate the prediction performance, 10 runs of tenfold cross validation were first conducted, and mGDisAI got the highest AUC of 0.740(± 0.0024). Then global and local leave-one-out cross validation (LOOCV) experiments were implemented to evaluate the model's accuracy in global and local situations respectively. AUC of 0.769 was achieved in global LOOCV, while 0.635 in local LOOCV. A case study was finally conducted to identify the most promising ovarian cancer-related mG sites for further functional analysis. Gene Ontology (GO) enrichment analysis was performed to explore the complex associations between host gene of mG sites and GO terms. The results showed that mGDisAI identified disease-associated mG sites and their host genes are consistently related to the pathogenesis of ovarian cancer, which may provide some clues for pathogenesis of diseases.
The mGDisAI web server can be accessed at http://180.208.58.66/m7GDisAI/ , which provides a user-friendly interface to query disease associated mG. The list of top 20 mG sites predicted to be associted with 177 diseases can be achieved. Furthermore, detailed information about specific mG sites and diseases are also shown.
最近的研究证实,N7-甲基鸟嘌呤(mG)修饰在调节各种生物过程中起着重要作用,并与多种疾病有关。湿实验对于鉴定与疾病相关的 mG 位点既费时又费钱。迄今为止,通过高通量测序方法已经鉴定了成千上万的 mG 位点,这些信息在生物信息学数据库中是公开可用的,可以利用这些信息从计算角度预测潜在的与疾病相关的 mG 位点。因此,迫切需要用于 mG-疾病关联预测的计算方法,但目前尚不存在。
为了填补这一空白,我们从不同的数据库中收集了 mG 位点与疾病、mG 位点的基因组信息和疾病的表型信息之间的关联信息,以构建 mG-疾病关联数据集。然后,我们提出了一种基于异质网络的模型 mG 位点和疾病关联推断(mGDisAI)模型,通过应用矩阵分解方法对整合了 mG 位点和疾病综合相似性信息的异质网络来推断潜在的与疾病相关的 mG 位点。为了评估预测性能,首先进行了 10 次 10 折交叉验证,mGDisAI 获得了 0.740(±0.0024)的最高 AUC。然后分别进行全局和局部留一法交叉验证(LOOCV)实验,以分别评估模型在全局和局部情况下的准确性。全局 LOOCV 的 AUC 为 0.769,而局部 LOOCV 的 AUC 为 0.635。最后进行了一个案例研究,以鉴定最有前途的卵巢癌相关 mG 位点,以便进一步进行功能分析。进行了基因本体论(GO)富集分析,以探索 mG 位点的宿主基因与 GO 术语之间的复杂关联。结果表明,mGDisAI 鉴定出与疾病相关的 mG 位点及其宿主基因与卵巢癌的发病机制一致,这可能为疾病的发病机制提供一些线索。
mGDisAI 网络服务器可在 http://180.208.58.66/m7GDisAI/ 访问,它提供了一个用户友好的界面来查询与疾病相关的 mG。可以获得预测与 177 种疾病相关的前 20 个 mG 位点的列表。此外,还显示了特定 mG 位点和疾病的详细信息。