Jurca Gabriela, Addam Omar, Aksac Alper, Gao Shang, Özyer Tansel, Demetrick Douglas, Alhajj Reda
Department of Computer Science, University of Calgary, Calgary, AB, Canada.
College of Computer Science and Technology, Jilin University, Changchun, China.
BMC Res Notes. 2016 Apr 26;9:236. doi: 10.1186/s13104-016-2023-5.
Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer.
We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries.
Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.
乳腺癌是一种严重的疾病,影响着众多女性,甚至可能导致死亡。它受到了研究界的广泛关注。因此,生物医学研究人员旨在寻找指示该疾病的基因生物标志物。可以从现有文献中阐明新的生物标志物。然而,关于乳腺癌的大量科学出版物使得这成为一项艰巨的任务。本文提出了一个框架,用于研究现有文献数据以获得有价值的发现。它整合了文本挖掘和社会网络分析,以识别乳腺癌新的潜在生物标志物。
我们利用PubMed进行测试。我们研究了基因 - 基因相互作用,以及诸如基因 - 年份、基因 - 国家和摘要 - 国家等新的相互作用,以了解发现如何随时间变化,以及不同国家各种研究小组的发现、兴趣的重叠/差异情况。
已识别并讨论了有趣的趋势,例如,尽管发现各种基因具有共享功能,但与不同国家相关的不同基因被突出显示。基于文本分析的一些结果已与其他预测基因 - 基因关系和基因功能的工具的结果进行了验证。