Jacaruso Lucas
University of Southern California, Los Angeles, CA, United States of America.
PeerJ Comput Sci. 2024 Mar 20;10:e1940. doi: 10.7717/peerj-cs.1940. eCollection 2024.
Topic modeling and text mining are subsets of natural language processing (NLP) with relevance for conducting meta-analysis (MA) and systematic review (SR). For evidence synthesis, the above NLP methods are conventionally used for topic-specific literature searches or extracting values from reports to automate essential phases of SR and MA. Instead, this work proposes a comparative topic modeling approach to analyze reports of contradictory results on the same general research question. Specifically, the objective is to identify topics exhibiting distinct associations with significant results for an outcome of interest by ranking them according to their proportional occurrence in (and consistency of distribution across) reports of significant effects. Macular degeneration (MD) is a disease that affects millions of people annually, causing vision loss. Augmenting evidence synthesis to provide insight into MD prevention is therefore of central interest in this article. The proposed method was tested on broad-scope studies addressing whether supplemental nutritional compounds significantly benefit macular degeneration. Six compounds were identified as having a particular association with reports of significant results for benefiting MD. Four of these were further supported in terms of effectiveness upon conducting a follow-up literature search for validation (omega-3 fatty acids, copper, zeaxanthin, and nitrates). The two not supported by the follow-up literature search (niacin and molybdenum) also had scores in the lowest range under the proposed scoring system. Results therefore suggest that the proposed method's score for a given topic may be a viable proxy for its degree of association with the outcome of interest, and can be helpful in the systematic search for potentially causal relationships. Further, the compounds identified by the proposed method were not simultaneously captured as salient topics by state-of-the-art topic models that leverage document and word embeddings (Top2Vec) and transformer models (BERTopic). These results underpin the proposed method's potential to add specificity in understanding effects from broad-scope reports, elucidate topics of interest for future research, and guide evidence synthesis in a scalable way. All of this is accomplished while yielding valuable and actionable insights into the prevention of MD.
主题建模和文本挖掘是自然语言处理(NLP)的子领域,与进行元分析(MA)和系统评价(SR)相关。对于证据综合,上述NLP方法通常用于特定主题的文献检索或从报告中提取值,以实现SR和MA的关键阶段自动化。相反,本文提出了一种比较主题建模方法,用于分析关于同一总体研究问题的矛盾结果报告。具体而言,目标是通过根据感兴趣结果的显著效应报告中的比例出现情况(以及分布一致性)对主题进行排名,识别与感兴趣结果的显著结果呈现不同关联的主题。黄斑变性(MD)是一种每年影响数百万人并导致视力丧失的疾病。因此,增强证据综合以深入了解MD预防是本文的核心关注点。所提出的方法在关于补充营养化合物是否对黄斑变性有显著益处的广泛研究中进行了测试。六种化合物被确定与MD有益的显著结果报告有特定关联。其中四种在进行后续文献检索以验证有效性时(ω-3脂肪酸、铜、玉米黄质和硝酸盐)得到了进一步支持。后续文献检索未支持的两种(烟酸和钼)在所提出的评分系统下得分也处于最低范围。因此,结果表明所提出方法针对给定主题的得分可能是其与感兴趣结果关联程度的可行代理,并且有助于系统地寻找潜在的因果关系。此外,所提出方法识别出的化合物并未被利用文档和词嵌入的先进主题模型(Top2Vec)和变压器模型(BERTopic)同时捕捉为显著主题。这些结果支持了所提出方法在理解广泛报告的效应方面增加特异性、阐明未来研究感兴趣主题以及以可扩展方式指导证据综合的潜力。所有这些都是在对MD预防产生有价值且可操作的见解的同时完成的。