Ning Li, Huixin He
Business School of Huaqiao University, Quan Zhou, China.
Management Science and Engineering Department, Management School, Xiamen University, Xiamen, China.
Front Cell Dev Biol. 2021 Apr 7;9:631011. doi: 10.3389/fcell.2021.631011. eCollection 2021.
One of the vital challenges for cancer diseases is efficient biomarkers monitoring formation and development are limited. Omics data integration plays a crucial role in the mining of biomarkers in the human condition. As the link between omics study on biomarkers discovery and cancer diseases is deepened, defining the principal technologies applied in the field is a must not only for the current period but also for the future. We utilize topic modeling to extract topics (or themes) as a probabilistic distribution of latent topics from the dataset. To predict the future trend of related cases, we utilize the Prophet neural network to perform a prediction correction model for existing topics. A total of 2,318 pieces of literature (from 2006 to 2020) were retrieved from MEDLINE with the query on "omics" and "cancer." Our study found 20 topics covering current research types. The topic extraction results indicate that, with the rapid development of omics data integration research, multi-omics analysis (Topic 11) and genomics of colorectal cancer (Topic 10) have more studies reported last 15 years. From the topic prediction view, research findings in multi-omics data processing and novel biomarker discovery for cancer prediction (Topic 2, 3, 10, 11) will be heavily focused in the future. From the topic visuallization and evolution trends, metabolomics of breast cancer (Topic 9), pharmacogenomics (Topic 15), genome-guided therapy regimens (Topic 16), and microRNAs target genes (Topic 17) could have more rapidly developed in the study of cancer treatment effect and recurrence prediction.
癌症疾病面临的一个重大挑战是,用于监测其形成和发展的有效生物标志物十分有限。组学数据整合在挖掘人类疾病生物标志物方面发挥着关键作用。随着生物标志物发现的组学研究与癌症疾病之间的联系不断深化,明确该领域应用的主要技术不仅对当前而且对未来都至关重要。我们利用主题建模从数据集中提取主题(或主题群)作为潜在主题的概率分布。为了预测相关病例的未来趋势,我们利用先知神经网络对现有主题执行预测校正模型。通过在MEDLINE上检索关于“组学”和“癌症”的查询,共获取了2318篇文献(2006年至2020年)。我们的研究发现了涵盖当前研究类型的20个主题。主题提取结果表明,随着组学数据整合研究的快速发展,多组学分析(主题11)和结直肠癌基因组学(主题10)在过去15年中有更多的研究报道。从主题预测的角度来看,多组学数据处理和用于癌症预测的新型生物标志物发现方面的研究结果(主题2、3、10、11)将在未来受到高度关注。从主题可视化和演变趋势来看,乳腺癌代谢组学(主题9)、药物基因组学(主题15)、基因组导向治疗方案(主题16)和微小RNA靶基因(主题17)在癌症治疗效果和复发预测研究中可能发展得更快。