Suppr超能文献

POCASUM:基于文本挖掘和机器学习的政策分类器与摘要器

POCASUM : Policy Categorizer and Summarizer Based on Text Mining and Machine Learning.

作者信息

Deotale Rushikesh, Rawat Shreyash, Vijayarajan V, Prasath V B Surya

机构信息

School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati OH 45229 USA. Departments of Pediatrics, Biomedical Informatics, Electrical Engineering and Computer Science, University of Cincinnati College of Medicine, Cincinnati, OH USA.

出版信息

Soft comput. 2021 Jul;25(14):9365-9375. doi: 10.1007/s00500-021-05916-w. Epub 2021 Jun 11.

Abstract

Having control over your data is a right and a duty that every citizen has in our digital society. It is often that users skip entire policies of applications or websites to save time and energy without realizing the potential sticky points in these policies. Due to obscure language and verbose explanations majority of users of hypermedia do not bother to read them. Further, sometimes digital media companies do not spend enough effort in stating their policies clearly which often time can also be incomplete. A summarized version of these privacy policies that can be categorized into the useful information can help the users. To solve this problem, in this work we propose to use machine learning based models for policy categorizer that classifies the policy paragraphs under the attributes proposed like security, contact etc. By benchmarking different machine learning based classifier models, we show that artificial neural network model performs with higher accuracy on a challenging dataset of textual privacy policies. We thus show that machine learning can help summarize the relevant paragraphs under the various attributes so that the user can get the gist of that topic within a few lines.

摘要

在我们的数字社会中,掌控自己的数据是每位公民的权利和义务。用户常常为节省时间和精力而跳过应用程序或网站的完整政策,却未意识到这些政策中潜在的关键问题。由于语言晦涩且解释冗长,大多数超媒体用户懒得去阅读它们。此外,有时数字媒体公司在清晰阐述其政策方面投入不足,这些政策往往也不完整。将这些隐私政策归纳为可分类的有用信息版本会对用户有所帮助。为解决这一问题,在这项工作中,我们提议使用基于机器学习的模型作为政策分类器,根据诸如安全、联系等提出的属性对政策段落进行分类。通过对不同的基于机器学习的分类器模型进行基准测试,我们表明人工神经网络模型在具有挑战性的文本隐私政策数据集上具有更高的准确率。因此,我们证明机器学习有助于总结各属性下的相关段落,以便用户能在几行内了解该主题的要点。

相似文献

引用本文的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验