• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用词嵌入增强共现网络:一项统计分析。

Leveraging word embeddings to enhance co-occurrence networks: A statistical analysis.

作者信息

Amancio Diego R, Machicao Jeaneth, Quispe Laura V C

机构信息

Institute of Mathematics and Computer Science - USP, Avenida Trabalhador S ao-carlense, no 400, CEP 13566-590, S ao Carlos, SP, Brazil.

Escola Politécnica da Universidade de S ao Paulo (EPUSP), São Paulo, Brazil.

出版信息

PLoS One. 2025 Jul 11;20(7):e0327421. doi: 10.1371/journal.pone.0327421. eCollection 2025.

DOI:10.1371/journal.pone.0327421
PMID:40644417
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12250493/
Abstract

Recent studies have explored the addition of virtual edges to word co-occurrence networks using word embeddings to enhance graph representations, particularly for short texts. While these enriched networks have demonstrated some success, the impact of incorporating semantic edges into traditional co-occurrence networks remains uncertain. In this study, we investigate two key statistical properties of text-based network models. First, we assess whether network metrics can effectively distinguish between meaningless and meaningful texts. Second, we analyze whether these metrics are more sensitive to syntactic or semantic aspects of the text. Our results show that incorporating virtual edges can have both positive and negative effects, depending on the specific network metric. For instance, the informativeness of the average shortest path and closeness centrality improves in short texts, while the clustering coefficient's informativeness decreases as more virtual edges are added. Additionally, we found that including stopwords affects the statistical properties of enriched networks. Our results, derived from enriching networks with FastText embeddings, offer a guideline for identifying the most appropriate network metrics for specific applications, based on typical text length and the nature of the task.

摘要

最近的研究探讨了使用词嵌入向词共现网络添加虚拟边,以增强图表示,特别是对于短文本。虽然这些丰富的网络已取得了一些成功,但将语义边纳入传统共现网络的影响仍不确定。在本研究中,我们调查了基于文本的网络模型的两个关键统计属性。首先,我们评估网络指标是否能有效区分无意义文本和有意义文本。其次,我们分析这些指标对文本的句法或语义方面是否更敏感。我们的结果表明,根据具体的网络指标,纳入虚拟边可能会产生积极和消极影响。例如,在短文本中,平均最短路径和接近中心性的信息量会提高,而随着添加更多虚拟边,聚类系数的信息量会降低。此外,我们发现包含停用词会影响丰富网络的统计属性。我们基于FastText嵌入丰富网络得出的结果,为根据典型文本长度和任务性质确定特定应用最合适的网络指标提供了指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b426/12250493/de26d7b4dad4/pone.0327421.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b426/12250493/b687acceadc4/pone.0327421.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b426/12250493/00946a53aee6/pone.0327421.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b426/12250493/eab5035cee45/pone.0327421.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b426/12250493/de26d7b4dad4/pone.0327421.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b426/12250493/b687acceadc4/pone.0327421.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b426/12250493/00946a53aee6/pone.0327421.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b426/12250493/eab5035cee45/pone.0327421.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b426/12250493/de26d7b4dad4/pone.0327421.g004.jpg

相似文献

1
Leveraging word embeddings to enhance co-occurrence networks: A statistical analysis.利用词嵌入增强共现网络:一项统计分析。
PLoS One. 2025 Jul 11;20(7):e0327421. doi: 10.1371/journal.pone.0327421. eCollection 2025.
2
Short-Term Memory Impairment短期记忆障碍
3
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
4
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
5
Effectiveness and cost-effectiveness of computer and other electronic aids for smoking cessation: a systematic review and network meta-analysis.计算机和其他电子戒烟辅助手段的有效性和成本效益:系统评价和网络荟萃分析。
Health Technol Assess. 2012;16(38):1-205, iii-v. doi: 10.3310/hta16380.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
7
Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.改善患有慢性病的儿童和青少年的学校参与度和学业成绩的教育支持服务。
Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.
8
Surgical interventions for treating extracapsular hip fractures in older adults: a network meta-analysis.老年人髋关节囊外骨折的手术干预:一项网络荟萃分析。
Cochrane Database Syst Rev. 2022 Feb 10;2(2):CD013405. doi: 10.1002/14651858.CD013405.pub2.
9
Gender differences in the context of interventions for improving health literacy in migrants: a qualitative evidence synthesis.移民健康素养提升干预措施背景下的性别差异:一项定性证据综合分析
Cochrane Database Syst Rev. 2024 Dec 12;12(12):CD013302. doi: 10.1002/14651858.CD013302.pub2.
10
Sexual Harassment and Prevention Training性骚扰与预防培训

本文引用的文献

1
Complexity-entropy analysis at different levels of organisation in written language.书面语言在不同组织层次上的复杂性-熵分析。
PLoS One. 2019 May 8;14(5):e0214863. doi: 10.1371/journal.pone.0214863. eCollection 2019.
2
Multiplex lexical networks reveal patterns in early word acquisition in children.多重词汇网络揭示了儿童早期词汇习得的模式。
Sci Rep. 2017 Apr 24;7:46730. doi: 10.1038/srep46730.
3
Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks.利用词共现网络动态识别文本作者身份。
PLoS One. 2017 Jan 26;12(1):e0170527. doi: 10.1371/journal.pone.0170527. eCollection 2017.
4
Approaching human language with complex networks.用复杂网络研究人类语言
Phys Life Rev. 2014 Dec;11(4):598-618. doi: 10.1016/j.plrev.2014.04.004. Epub 2014 Apr 18.
5
Probing the statistical properties of unknown texts: application to the Voynich Manuscript.探测未知文本的统计属性:在伏尼契手稿中的应用。
PLoS One. 2013 Jul 2;8(7):e67310. doi: 10.1371/journal.pone.0067310. Print 2013.
6
Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis.《伏尼契手稿中的关键词与共现模式:信息论分析》
PLoS One. 2013 Jun 21;8(6):e66344. doi: 10.1371/journal.pone.0066344. Print 2013.
7
Extracting the multiscale backbone of complex weighted networks.提取复杂加权网络的多尺度骨干
Proc Natl Acad Sci U S A. 2009 Apr 21;106(16):6483-8. doi: 10.1073/pnas.0808904106. Epub 2009 Apr 8.
8
Patterns in syntactic dependency networks.句法依存网络中的模式。
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 May;69(5 Pt 1):051915. doi: 10.1103/PhysRevE.69.051915. Epub 2004 May 26.
9
The small world of human language.人类语言的小世界。
Proc Biol Sci. 2001 Nov 7;268(1482):2261-5. doi: 10.1098/rspb.2001.1800.