Ren Shumin, Jin Yanwen, Chen Yalan, Shen Bairong
Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 610212, China.
Center for Systems Biology, Soochow University, Suzhou 215006, China.
Bioinformatics. 2022 Mar 4;38(6):1669-1676. doi: 10.1093/bioinformatics/btab850.
In the era of big data and precision medicine, accurate risk assessment is a prerequisite for the implementation of risk screening and preventive treatment. A large number of studies have focused on the risk of cancer, and related risk prediction models have been constructed, but there is a lack of effective resource integration for systematic comparison and personalized applications. Therefore, the establishment and analysis of the cancer risk prediction model knowledge base (CRPMKB) is of great significance.
The current knowledge base contains 802 model data. The model comparison indicates that the accuracy of cancer risk prediction was greatly affected by regional differences, cancer types and model types. We divided the model variables into four categories: environment, behavioral lifestyle, biological genetics and clinical examination, and found that there are differences in the distribution of various variables among different cancer types. Taking 50 genes involved in the lung cancer risk prediction models as an example to perform pathway enrichment analyses and the results showed that these genes were significantly enriched in p53 Signaling and Aryl Hydrocarbon Receptor Signaling pathways which are associated with cancer and specific diseases. In addition, we verified the biological significance of overlapping lung cancer genes via STRING database. CRPMKB was established to provide researchers an online tool for the future personalized model application and developing. This study of CRPMKB suggests that developing more targeted models based on specific demographic characteristics and cancer types will further improve the accuracy of cancer risk model predictions.
CRPMKB is freely available at http://www.sysbio.org.cn/CRPMKB/. The data underlying this article are available in the article and in its online supplementary material.
Supplementary data are available at Bioinformatics online.
在大数据和精准医学时代,准确的风险评估是实施风险筛查和预防性治疗的先决条件。大量研究聚焦于癌症风险,构建了相关风险预测模型,但缺乏有效的资源整合用于系统比较和个性化应用。因此,建立和分析癌症风险预测模型知识库(CRPMKB)具有重要意义。
当前知识库包含802个模型数据。模型比较表明,癌症风险预测的准确性受地区差异、癌症类型和模型类型的影响很大。我们将模型变量分为四类:环境、行为生活方式、生物遗传学和临床检查,发现不同癌症类型中各变量的分布存在差异。以参与肺癌风险预测模型的50个基因为例进行通路富集分析,结果显示这些基因在与癌症和特定疾病相关的p53信号通路和芳烃受体信号通路中显著富集。此外,我们通过STRING数据库验证了重叠肺癌基因的生物学意义。建立CRPMKB为研究人员提供了一个用于未来个性化模型应用和开发的在线工具。对CRPMKB的这项研究表明,基于特定人口统计学特征和癌症类型开发更具针对性的模型将进一步提高癌症风险模型预测的准确性。
CRPMKB可在http://www.sysbio.org.cn/CRPMKB/免费获取。本文的基础数据在文章及其在线补充材料中提供。
补充数据可在《生物信息学》在线获取。