• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

德国健康网站的可读性和主题:探索性研究和文本分析。

Readability and topics of the German Health Web: Exploratory study and text analysis.

机构信息

Department of Medical Informatics, Heilbronn University, Heilbronn, Germany.

Center for Machine Learning, Heilbronn University, Heilbronn, Germany.

出版信息

PLoS One. 2023 Feb 10;18(2):e0281582. doi: 10.1371/journal.pone.0281582. eCollection 2023.

DOI:10.1371/journal.pone.0281582
PMID:36763573
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9916670/
Abstract

BACKGROUND

The internet has become an increasingly important resource for health information, especially for lay people. However, the information found does not necessarily comply with the user's health literacy level. Therefore, it is vital to (1) identify prominent information providers, (2) quantify the readability of written health information, and (3) to analyze how different types of information sources are suited for people with differing health literacy levels.

OBJECTIVE

In previous work, we showed the use of a focused crawler to "capture" and describe a large sample of the "German Health Web", which we call the "Sampled German Health Web" (sGHW). It includes health-related web content of the three mostly German speaking countries Germany, Austria, and Switzerland, i.e. country-code top-level domains (ccTLDs) ".de", ".at" and ".ch". Based on the crawled data, we now provide a fully automated readability and vocabulary analysis of a subsample of the sGHW, an analysis of the sGHW's graph structure covering its size, its content providers and a ratio of public to private stakeholders. In addition, we apply Latent Dirichlet Allocation (LDA) to identify topics and themes within the sGHW.

METHODS

Important web sites were identified by applying PageRank on the sGHW's graph representation. LDA was used to discover topics within the top-ranked web sites. Next, a computer-based readability and vocabulary analysis was performed on each health-related web page. Flesch Reading Ease (FRE) and the 4th Vienna formula (WSTF) were used to assess the readability. Vocabulary was assessed by a specifically trained Support Vector Machine classifier.

RESULTS

In total, n = 14,193,743 health-related web pages were collected during the study period of 370 days. The resulting host-aggregated web graph comprises 231,733 nodes connected via 429,530 edges (network diameter = 25; average path length = 6.804; average degree = 1.854; modularity = 0.723). Among 3000 top-ranked pages (1000 per ccTLD according to PageRank), 18.50%(555/3000) belong to web sites from governmental or public institutions, 18.03% (541/3000) from nonprofit organizations, 54.03% (1621/3000) from private organizations, 4.07% (122/3000) from news agencies, 3.87% (116/3000) from pharmaceutical companies, 0.90% (27/3000) from private bloggers, and 0.60% (18/3000) are from others. LDA identified 50 topics, which we grouped into 11 themes: "Research & Science", "Illness & Injury", "The State", "Healthcare structures", "Diet & Food", "Medical Specialities", "Economy", "Food production", "Health communication", "Family" and "Other". The most prevalent themes were "Research & Science" and "Illness & Injury" accounting for 21.04% and 17.92% of all topics across all ccTLDs and provider types, respectively. Our readability analysis reveals that the majority of the collected web sites is structurally difficult or very difficult to read: 84.63% (2539/3000) scored a WSTF ≥ 12, 89.70% (2691/3000) scored a FRE ≤ 49. Moreover, our vocabulary analysis shows that 44.00% (1320/3000) web sites use vocabulary that is well suited for a lay audience.

CONCLUSIONS

We were able to identify major information hubs as well as topics and themes within the sGHW. Results indicate that the readability within the sGHW is low. As a consequence, patients may face barriers, even though the vocabulary used seems appropriate from a medical perspective. In future work, the authors intend to extend their analyses to identify trustworthy health information web sites.

摘要

背景

互联网已成为获取健康信息的重要资源,尤其是对于非专业人士而言。然而,所获取的信息并不一定符合用户的健康素养水平。因此,(1)识别主要的信息提供者,(2)量化书面健康信息的可读性,以及(3)分析不同类型的信息来源如何适合不同健康素养水平的人群,这些都至关重要。

目的

在之前的工作中,我们展示了使用聚焦爬虫来“捕获”和描述“德国健康网络”的大量样本,我们称之为“抽样德国健康网络”(Sampled German Health Web,sGHW)。它包括德国、奥地利和瑞士这三个德语国家的与健康相关的网络内容,即国家代码顶级域名(country-code top-level domains,ccTLD)“.de”、“.at”和“.ch”。基于所抓取的数据,我们现在提供 sGHW 的一个子样本的完全自动化可读性和词汇分析,分析 sGHW 的图结构,包括其大小、内容提供者以及公共和私人利益相关者的比例。此外,我们应用潜在狄利克雷分配(Latent Dirichlet Allocation,LDA)来识别 sGHW 中的主题和主题。

方法

通过在 sGHW 的图表示上应用 PageRank 来确定重要的网站。使用 LDA 来发现顶级网站中的主题。接下来,对每个与健康相关的网页进行基于计算机的可读性和词汇分析。使用弗莱什阅读舒适度(Flesch Reading Ease,FRE)和维也纳第 4 公式(Vienna Sentence Complexity Formula,WSTF)来评估可读性。词汇是通过专门训练的支持向量机分类器来评估的。

结果

在 370 天的研究期间,共收集了 n = 14,193,743 个与健康相关的网页。生成的主机聚合网络图包含 231,733 个节点,通过 429,530 条边连接(网络直径=25;平均路径长度=6.804;平均度数=1.854;模块性=0.723)。在 3000 个排名最高的网页中(根据 PageRank 排名,每个 ccTLD 有 1000 个网页),18.50%(555/3000)属于政府或公共机构的网站,18.03%(541/3000)属于非营利组织,54.03%(1621/3000)属于私人组织,4.07%(122/3000)属于新闻机构,3.87%(116/3000)属于制药公司,0.90%(27/3000)属于私人博客,0.60%(18/3000)属于其他类型的网站。LDA 确定了 50 个主题,我们将其分为 11 个主题:“研究与科学”、“疾病与伤害”、“国家”、“医疗结构”、“饮食与食物”、“医学专业”、“经济”、“食品生产”、“健康沟通”、“家庭”和“其他”。最常见的主题是“研究与科学”和“疾病与伤害”,分别占所有 ccTLD 和提供商类型的所有主题的 21.04%和 17.92%。我们的可读性分析表明,收集的大多数网站结构复杂或非常难以阅读:84.63%(2539/3000)的 WSTF≥12,89.70%(2691/3000)的 FRE≤49。此外,我们的词汇分析表明,44.00%(1320/3000)的网站使用的词汇非常适合非专业人士。

结论

我们能够识别 sGHW 中的主要信息中心以及主题和主题。结果表明,sGHW 中的可读性较低。因此,尽管从医学角度来看,所使用的词汇似乎是合适的,但患者可能会遇到障碍。在未来的工作中,作者打算扩展他们的分析,以识别值得信赖的健康信息网站。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/14f4a265d6bd/pone.0281582.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/f55b05684244/pone.0281582.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/1d96cb0b49b0/pone.0281582.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/78103a6c9cd1/pone.0281582.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/6eda7f375d58/pone.0281582.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/55dc8f6b87b1/pone.0281582.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/7f6335e709fd/pone.0281582.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/361b2671c426/pone.0281582.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/267abb6e4930/pone.0281582.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/56c83fc089d8/pone.0281582.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/32ced7b3a03b/pone.0281582.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/c1998eefbfa5/pone.0281582.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/14f4a265d6bd/pone.0281582.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/f55b05684244/pone.0281582.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/1d96cb0b49b0/pone.0281582.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/78103a6c9cd1/pone.0281582.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/6eda7f375d58/pone.0281582.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/55dc8f6b87b1/pone.0281582.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/7f6335e709fd/pone.0281582.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/361b2671c426/pone.0281582.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/267abb6e4930/pone.0281582.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/56c83fc089d8/pone.0281582.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/32ced7b3a03b/pone.0281582.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/c1998eefbfa5/pone.0281582.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506e/9916670/14f4a265d6bd/pone.0281582.g012.jpg

相似文献

1
Readability and topics of the German Health Web: Exploratory study and text analysis.德国健康网站的可读性和主题:探索性研究和文本分析。
PLoS One. 2023 Feb 10;18(2):e0281582. doi: 10.1371/journal.pone.0281582. eCollection 2023.
2
Crawling the German Health Web: Exploratory Study and Graph Analysis.爬取德国健康网站:探索性研究与图谱分析。
J Med Internet Res. 2020 Jul 24;22(7):e17853. doi: 10.2196/17853.
3
Computer-Based Readability Testing of Information Booklets for German Cancer Patients.德国癌症患者信息手册的计算机可读性测试
J Cancer Educ. 2019 Aug;34(4):696-704. doi: 10.1007/s13187-018-1358-0.
4
Readability of English, German, and Russian Disease-Related Wikipedia Pages: Automated Computational Analysis.英文、德文和俄文疾病相关维基百科页面的易读性:自动化计算分析。
J Med Internet Res. 2022 May 16;24(5):e36835. doi: 10.2196/36835.
5
A readability assessment of online stroke information.在线中风信息的可读性评估。
J Stroke Cerebrovasc Dis. 2014 Jul;23(6):1362-7. doi: 10.1016/j.jstrokecerebrovasdis.2013.11.017. Epub 2014 Jan 3.
6
Health literacy and the Internet: a study on the readability of Australian online health information.健康素养与互联网:一项关于澳大利亚在线健康信息可读性的研究。
Aust N Z J Public Health. 2015 Aug;39(4):309-14. doi: 10.1111/1753-6405.12341. Epub 2015 Feb 25.
7
Online nutrition information for pregnant women: a content analysis.孕妇在线营养信息:一项内容分析
Matern Child Nutr. 2017 Apr;13(2). doi: 10.1111/mcn.12315. Epub 2016 Jun 29.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
Improving access to COVID-19 information by ensuring the readability of government websites.通过确保政府网站的易读性来改善获取 COVID-19 信息的途径。
Health Promot J Austr. 2023 Apr;34(2):595-602. doi: 10.1002/hpja.610. Epub 2022 May 13.
10
Readability of web-based sources about induced abortion: a cross-sectional study.基于网络的关于人工流产相关内容的可读性:一项横断面研究。
BMC Med Inform Decis Mak. 2020 Jun 5;20(1):102. doi: 10.1186/s12911-020-01132-y.

引用本文的文献

1
Analysis of web tracking and geolocation of German-language health websites.德语健康网站的网络追踪与地理位置分析
PLoS One. 2025 May 15;20(5):e0323462. doi: 10.1371/journal.pone.0323462. eCollection 2025.
2
Evaluating the Acceptance and Usability of an Independent, Noncommercial Search Engine for Medical Information: Cross-Sectional Questionnaire Study and User Behavior Tracking Analysis.评估一个独立的非商业性医学信息搜索引擎的可接受性和可用性:横断面问卷调查研究及用户行为跟踪分析
JMIR Hum Factors. 2025 Jan 23;12:e56941. doi: 10.2196/56941.

本文引用的文献

1
Readability of online COVID-19 health information: a comparison between four English speaking countries.在线 COVID-19 健康信息的易读性:四个英语国家之间的比较。
BMC Public Health. 2020 Nov 13;20(1):1635. doi: 10.1186/s12889-020-09710-5.
2
Topic Modeling for Analyzing Patients' Perceptions and Concerns of Hearing Loss on Social Q&A Sites: Incorporating Patients' Perspective.主题建模分析社交问答网站中患者对听力损失的看法和担忧:纳入患者视角
Int J Environ Res Public Health. 2020 Aug 27;17(17):6209. doi: 10.3390/ijerph17176209.
3
Crawling the German Health Web: Exploratory Study and Graph Analysis.
爬取德国健康网站:探索性研究与图谱分析。
J Med Internet Res. 2020 Jul 24;22(7):e17853. doi: 10.2196/17853.
4
Topic Modeling of Social Networking Service Data on Occupational Accidents in Korea: Latent Dirichlet Allocation Analysis.主题建模的社交网络服务数据的职业事故在韩国:潜在狄利克雷分配分析。
J Med Internet Res. 2020 Aug 13;22(8):e19222. doi: 10.2196/19222.
5
Understanding Weight Loss via Online Discussions: Content Analysis of Reddit Posts Using Topic Modeling and Word Clustering Techniques.通过在线讨论理解体重减轻:使用主题建模和词聚类技术对Reddit帖子进行内容分析
J Med Internet Res. 2020 Jun 8;22(6):e13745. doi: 10.2196/13745.
6
Online Health Information Seeking and eHealth Literacy Among Patients Attending a Primary Care Clinic in Hong Kong: A Cross-Sectional Survey.香港基层医疗诊所患者的在线健康信息搜索与电子健康素养:一项横断面调查
J Med Internet Res. 2019 Mar 27;21(3):e10831. doi: 10.2196/10831.
7
Data Analysis and Visualization of Newspaper Articles on Thirdhand Smoke: A Topic Modeling Approach.报纸上关于三手烟文章的数据分析与可视化:一种主题建模方法
JMIR Med Inform. 2019 Jan 29;7(1):e12414. doi: 10.2196/12414.
8
Exploring the Most Visible German Websites on Melanoma Immunotherapy: A Web-Based Analysis.探索德国关于黑色素瘤免疫疗法的最热门网站:一项基于网络的分析。
JMIR Cancer. 2018 Dec 13;4(2):e10676. doi: 10.2196/10676.
9
Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017.全球、区域和国家层面 195 个国家和地区 1990 年至 2017 年 354 种疾病和伤害导致的发病率、患病率和伤残损失寿命年:基于 2017 年全球疾病负担研究的系统分析。
Lancet. 2018 Nov 10;392(10159):1789-1858. doi: 10.1016/S0140-6736(18)32279-7. Epub 2018 Nov 8.
10
#Healthy Selfies: Exploration of Health Topics on Instagram.# 健康自拍:Instagram上健康话题的探索
JMIR Public Health Surveill. 2018 Jun 29;4(2):e10150. doi: 10.2196/10150.