• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

识别健康网站上未经证实的癌症治疗方法:解决准确性、普遍性和可扩展性问题。

Identifying unproven cancer treatments on the health web: addressing accuracy, generalizability and scalability.

作者信息

Aphinyanaphongs Yin, Fu Lawrence D, Aliferis Constantin F

机构信息

Center for Health Informatics and Bioinformatics, NYU Langone Medical Center, NY, NY, USA.

出版信息

Stud Health Technol Inform. 2013;192:667-71.

PMID:23920640
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4162393/
Abstract

Building machine learning models that identify unproven cancer treatments on the Health Web is a promising approach for dealing with the dissemination of false and dangerous information to vulnerable health consumers. Aside from the obvious requirement of accuracy, two issues are of practical importance in deploying these models in real world applications. (a) Generalizability: The models must generalize to all treatments (not just the ones used in the training of the models). (b) Scalability: The models can be applied efficiently to billions of documents on the Health Web. First, we provide methods and related empirical data demonstrating strong accuracy and generalizability. Second, by combining the MapReduce distributed architecture and high dimensionality compression via Markov Boundary feature selection, we show how to scale the application of the models to WWW-scale corpora. The present work provides evidence that (a) a very small subset of unproven cancer treatments is sufficient to build a model to identify unproven treatments on the web; (b) unproven treatments use distinct language to market their claims and this language is learnable; (c) through distributed parallelization and state of the art feature selection, it is possible to prepare the corpora and build and apply models with large scalability.

摘要

构建能够识别健康网站上未经证实的癌症治疗方法的机器学习模型,是应对向易受影响的健康消费者传播虚假和危险信息的一种有前途的方法。除了准确性这一明显要求外,在实际应用中部署这些模型时,有两个问题具有实际重要性。(a)通用性:模型必须能够推广到所有治疗方法(而不仅仅是用于模型训练的那些)。(b)可扩展性:模型可以有效地应用于健康网站上数十亿的文档。首先,我们提供了方法和相关实证数据,证明了强大的准确性和通用性。其次,通过结合MapReduce分布式架构和通过马尔可夫边界特征选择进行的高维压缩,我们展示了如何将模型的应用扩展到万维网规模的语料库。目前的工作提供了证据,即(a)一小部分未经证实的癌症治疗方法就足以构建一个模型来识别网络上未经证实的治疗方法;(b)未经证实的治疗方法使用独特的语言来宣传其主张,并且这种语言是可学习的;(c)通过分布式并行化和先进的特征选择,可以准备语料库并构建和应用具有高可扩展性的模型。

相似文献

1
Identifying unproven cancer treatments on the health web: addressing accuracy, generalizability and scalability.识别健康网站上未经证实的癌症治疗方法:解决准确性、普遍性和可扩展性问题。
Stud Health Technol Inform. 2013;192:667-71.
2
Text categorization models for identifying unproven cancer treatments on the web.用于识别网络上未经证实的癌症治疗方法的文本分类模型。
Stud Health Technol Inform. 2007;129(Pt 2):968-72.
3
Expanding DISCERN to create a tool for assessing the quality of Web-based health information resources.扩展DISCERN以创建一个用于评估基于网络的健康信息资源质量的工具。
AMIA Annu Symp Proc. 2008 Nov 6:1048.
4
Semantic Space models for classification of consumer webpages on metadata attributes.基于语义空间模型的消费者网页元数据属性分类。
J Biomed Inform. 2010 Oct;43(5):725-35. doi: 10.1016/j.jbi.2010.06.005. Epub 2010 Jun 23.
5
Testosterone replacement therapy and the internet: an assessment of providers' health-related web site information content.睾酮替代疗法与互联网:对供应商健康相关网站信息内容的评估。
Urology. 2015 Apr;85(4):814-8. doi: 10.1016/j.urology.2014.11.043.
6
What cancer patients find in the internet: the visibility of evidence-based patient information - analysis of information on German websites.癌症患者在互联网上找到的内容:基于证据的患者信息的可见性 - 对德国网站信息的分析。
Oncol Res Treat. 2015;38(5):212-8. doi: 10.1159/000381739. Epub 2015 Apr 14.
7
[Individual health information under scrutiny: the new "IGeL Monitor" internet site informs about value and risks].
Kinderkrankenschwester. 2012 Mar;31(3):116-7.
8
Social media in health--what are the safety concerns for health consumers?社交媒体在健康领域的应用——健康消费者的安全问题有哪些?
Health Inf Manag. 2012;41(2):30-5. doi: 10.1177/183335831204100204.
9
English and Spanish oral cancer information on the internet: a pilot surface quality and content evaluation of oral cancer web sites.网上的英文和西班牙语口腔癌信息:口腔癌网站表面质量和内容评估的初步研究。
J Public Health Dent. 2011 Spring;71(2):106-16. doi: 10.1111/j.1752-7325.2010.00207.x.
10
User friendly web site a winner. San Diego's Sharp HealthCare provides wealth of information.用户友好型网站大获成功。圣地亚哥的夏普医疗保健公司提供了丰富的信息。
Profiles Healthc Mark. 2003 May-Jun;19(3):34-40, 3.

引用本文的文献

1
Exploring an herbal "wonder cure" for cancer: a multidisciplinary approach.探索一种治疗癌症的草药“神奇疗法”:一种多学科方法。
J Cancer Res Clin Oncol. 2016 Jul;142(7):1499-508. doi: 10.1007/s00432-016-2175-7. Epub 2016 May 7.
2
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.MapReduce 编程框架在临床大数据分析中的应用:现状与未来趋势。
BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.

本文引用的文献

1
Assessing cancer treatment related information online: unintended retrieval of complementary and alternative medicine web sites.在线评估癌症治疗相关信息:意外检索到补充和替代医学网站。
Eur J Cancer Care (Engl). 2009 Jan;18(1):64-8. doi: 10.1111/j.1365-2354.2008.00944.x. Epub 2008 Sep 1.
2
Text categorization models for identifying unproven cancer treatments on the web.用于识别网络上未经证实的癌症治疗方法的文本分类模型。
Stud Health Technol Inform. 2007;129(Pt 2):968-72.
3
Machine learning approach for automatic quality criteria detection of health web pages.用于自动检测健康网页质量标准的机器学习方法。
Stud Health Technol Inform. 2007;129(Pt 1):705-9.
4
Automated assessment of the quality of depression websites.抑郁症相关网站质量的自动化评估
J Med Internet Res. 2005 Dec 30;7(5):e59. doi: 10.2196/jmir.7.5.e59.
5
Life-threatening interaction between complementary medicines: cyanide toxicity following ingestion of amygdalin and vitamin C.补充药物之间危及生命的相互作用:摄入苦杏仁苷和维生素C后发生氰化物中毒
Ann Pharmacother. 2005 Sep;39(9):1566-9. doi: 10.1345/aph.1E634. Epub 2005 Jul 12.
6
GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data.GEMS:一种用于从微阵列基因表达数据中进行癌症自动诊断和生物标志物发现的系统。
Int J Med Inform. 2005 Aug;74(7-8):491-503. doi: 10.1016/j.ijmedinf.2005.05.002.
7
Herbal remedies in the United States: potential adverse interactions with anticancer agents.美国的草药疗法:与抗癌药物潜在的不良相互作用。
J Clin Oncol. 2004 Jun 15;22(12):2489-503. doi: 10.1200/JCO.2004.08.182.
8
Assessing websites on complementary and alternative medicine for cancer.评估关于癌症的补充和替代医学的网站。
Ann Oncol. 2004 May;15(5):733-42. doi: 10.1093/annonc/mdh174.
9
A multi-institutional study of Internet utilization by radiation oncology patients.一项关于放射肿瘤患者互联网使用情况的多机构研究。
Int J Radiat Oncol Biol Phys. 2003 Jul 15;56(4):1201-5. doi: 10.1016/s0360-3016(03)00407-3.
10
Indicators of accuracy of consumer health information on the Internet: a study of indicators relating to information for managing fever in children in the home.互联网上消费者健康信息的准确性指标:一项关于家庭中儿童发热管理信息相关指标的研究。
J Am Med Inform Assoc. 2002 Jan-Feb;9(1):73-9. doi: 10.1136/jamia.2002.0090073.