Li Dingcheng, Okamoto Janet, Liu Hongfang, Leischow Scott
Department of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN USA.
Department of Hemotology/Oncology Mayo Clinic, Scottsdale, AZ Arizona.
BioData Min. 2015 Mar 21;8:11. doi: 10.1186/s13040-015-0043-7. eCollection 2015.
To facilitate the implementation of the Family Smoking Prevention and Tobacco Control Act of 2009, the Federal Drug Agency (FDA) Center for Tobacco Products (CTP) has identified research priorities under the umbrella of tobacco regulatory science (TRS). As a newly integrated field, the current boundaries and landscape of TRS research are in need of definition. In this work, we conducted a bibliometric study of TRS research by applying author topic modeling (ATM) on MEDLINE citations published by currently-funded TRS principle investigators (PIs).
We compared topics generated with ATM on dataset collected with TRS PIs and topics generated with ATM on dataset collected with a TRS keyword list. It is found that all those topics show a good alignment with FDA's funding protocols. More interestingly, we can see clear interactive relationships among PIs and between PIs and topics. Based on those interactions, we can discover how diverse each PI is, how productive they are, which topics are more popular and what main components each topic involves. Temporal trend analysis of key words shows the significant evaluation in four prime TRS areas.
The results show that ATM can efficiently group articles into discriminative categories without any supervision. This indicates that we may incorporate ATM into author identification systems to infer the identity of an author of articles using topics generated by the model. It can also be useful to grantees and funding administrators in suggesting potential collaborators or identifying those that share common research interests for data harmonization or other purposes. The incorporation of temporal analysis can be employed to assess the change over time in TRS as new projects are funded and the extent to which new research reflects the funding priorities of the FDA.
为推动2009年《家庭吸烟预防与烟草控制法案》的实施,美国联邦药物管理局(FDA)烟草制品中心(CTP)确定了烟草监管科学(TRS)框架下的研究重点。作为一个新整合的领域,TRS研究的当前边界和概况需要明确界定。在这项工作中,我们通过对当前获得资助的TRS主要研究者(PI)发表在MEDLINE上的文献引用应用作者主题建模(ATM),对TRS研究进行了文献计量学研究。
我们比较了用ATM在TRS研究者收集的数据集中生成的主题和用ATM在TRS关键词列表收集的数据集中生成的主题。发现所有这些主题都与FDA的资助方案高度一致。更有趣的是,我们可以看到研究者之间以及研究者与主题之间存在明显的互动关系。基于这些互动,我们可以发现每个研究者的多样性、生产力如何,哪些主题更受欢迎以及每个主题涉及哪些主要内容。关键词的时间趋势分析显示了在四个主要TRS领域的显著评估。
结果表明,ATM可以在无任何监督的情况下有效地将文章分组到有区别的类别中。这表明我们可以将ATM纳入作者识别系统,以利用模型生成的主题推断文章作者的身份。这对受资助者和资助管理人员在建议潜在合作者或识别那些出于数据协调或其他目的而有共同研究兴趣的人方面也可能有用。随着新项目获得资助,纳入时间分析可用于评估TRS随时间的变化以及新研究反映FDA资助重点的程度。