Wu Chengkun, Schwartz Jean-Marc, Brabant Georg, Nenadic Goran
BMC Med Genomics. 2014;7 Suppl 3(Suppl 3):S3. doi: 10.1186/1755-8794-7-S3-S3. Epub 2014 Dec 8.
Thyroid cancer is the most common endocrine tumor with a steady increase in incidence. It is classified into multiple histopathological subtypes with potentially distinct molecular mechanisms. Identifying the most relevant genes and biological pathways reported in the thyroid cancer literature is vital for understanding of the disease and developing targeted therapeutics.
We developed a large-scale text mining system to generate a molecular profiling of thyroid cancer subtypes. The system first uses a subtype classification method for the thyroid cancer literature, which employs a scoring scheme to assign different subtypes to articles. We evaluated the classification method on a gold standard derived from the PubMed Supplementary Concept annotations, achieving a micro-average F1-score of 85.9% for primary subtypes. We then used the subtype classification results to extract genes and pathways associated with different thyroid cancer subtypes and successfully unveiled important genes and pathways, including some instances that are missing from current manually annotated databases or most recent review articles.
Identification of key genes and pathways plays a central role in understanding the molecular biology of thyroid cancer. An integration of subtype context can allow prioritized screening for diagnostic biomarkers and novel molecular targeted therapeutics. Source code used for this study is made freely available online at https://github.com/chengkun-wu/GenesThyCan.
甲状腺癌是最常见的内分泌肿瘤,其发病率呈稳步上升趋势。它被分为多种组织病理学亚型,可能具有不同的分子机制。识别甲状腺癌文献中最相关的基因和生物学途径对于理解该疾病和开发靶向治疗至关重要。
我们开发了一个大规模文本挖掘系统,以生成甲状腺癌亚型的分子图谱。该系统首先对甲状腺癌文献使用一种亚型分类方法,该方法采用评分方案为文章分配不同的亚型。我们在源自PubMed补充概念注释的金标准上评估了该分类方法,主要亚型的微平均F1分数达到了85.9%。然后,我们使用亚型分类结果提取与不同甲状腺癌亚型相关的基因和途径,并成功揭示了重要的基因和途径,包括一些当前手动注释数据库或最新综述文章中缺失的实例。
关键基因和途径的识别在理解甲状腺癌的分子生物学中起着核心作用。整合亚型背景可以优先筛选诊断生物标志物和新型分子靶向治疗药物。本研究使用的源代码可在https://github.com/chengkun-wu/GenesThyCan上免费在线获取。