• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于计算机辅助文本优化中人与机器协作的半监督框架。

A semi supervised framework for human and machine collaboration in computer assisted text refinement.

作者信息

Sun Yicheng, Wang Yi, Yang Hanbo, Suen Richard

机构信息

School of Mechanical and Precision Instrument Engineering, Xi'an University of Technology, Xi'an, China.

Faculty of Management, Shenzhen MSU-BIT University, Shenzhen, China.

出版信息

Sci Rep. 2025 Jul 7;15(1):24312. doi: 10.1038/s41598-025-10085-z.

DOI:10.1038/s41598-025-10085-z
PMID:40624092
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12234744/
Abstract

Human writing often exhibits a range of styles and levels of sophistication. However, automated text generation systems typically lack the nuanced understanding required to produce refined and elegant prose. Due to the inherent one-to-many relationship between inputs and outputs in natural language generation tasks, achieving annotator consistency is challenging. This complexity makes the annotation process considerably more difficult compared to tasks focused on natural language understanding. Our study focuses on the typical task of text refinement, which faces annotation difficulties, aiming to generate sentences with more elegant expressions while preserving the original semantics of the input sentence. This paper proposes a semi-automatic data construction method that combines auto-generation with human judgment. Initially, this method translates collected sentences containing elegant expressions into ordinary expressions through back translation. Subsequently, in an iterative quality control process, data filtering and human judgment are introduced to screen the auto-generated data based on quality standards, resulting in a large-scale text refinement dataset. By replacing manual annotation with human judgment and involving only a small amount of data for human judgment in each iteration, this method significantly reduces annotation difficulty and workload. With minimal human effort, it acquires a substantial amount of labeled data for text refinement, laying a foundation for further research in the field.

摘要

人类写作往往展现出一系列风格和复杂程度。然而,自动文本生成系统通常缺乏生成精致优美散文所需的细微理解。由于自然语言生成任务中输入与输出之间固有的一对多关系,实现注释者的一致性具有挑战性。与专注于自然语言理解的任务相比,这种复杂性使得注释过程困难得多。我们的研究聚焦于面临注释困难的文本优化典型任务,旨在生成表达更优美的句子,同时保留输入句子的原始语义。本文提出一种将自动生成与人工判断相结合的半自动数据构建方法。最初,该方法通过反向翻译将收集到的包含优美表达的句子转换为普通表达。随后,在迭代质量控制过程中,引入数据过滤和人工判断,根据质量标准筛选自动生成的数据,从而得到一个大规模的文本优化数据集。通过用人工判断取代人工注释,且每次迭代仅涉及少量数据进行人工判断,该方法显著降低了注释难度和工作量。以最少的人力,它获取了大量用于文本优化的标注数据,为该领域的进一步研究奠定了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/1a4167e9ccf0/41598_2025_10085_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/de2eb7591631/41598_2025_10085_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/f194ea1d6e2c/41598_2025_10085_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/a6ee73350e79/41598_2025_10085_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/7cce8d3ac43e/41598_2025_10085_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/1a4167e9ccf0/41598_2025_10085_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/de2eb7591631/41598_2025_10085_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/f194ea1d6e2c/41598_2025_10085_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/a6ee73350e79/41598_2025_10085_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/7cce8d3ac43e/41598_2025_10085_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41f6/12234744/1a4167e9ccf0/41598_2025_10085_Fig5_HTML.jpg

相似文献

1
A semi supervised framework for human and machine collaboration in computer assisted text refinement.一种用于计算机辅助文本优化中人与机器协作的半监督框架。
Sci Rep. 2025 Jul 7;15(1):24312. doi: 10.1038/s41598-025-10085-z.
2
Effectiveness and cost-effectiveness of computer and other electronic aids for smoking cessation: a systematic review and network meta-analysis.计算机和其他电子戒烟辅助手段的有效性和成本效益:系统评价和网络荟萃分析。
Health Technol Assess. 2012;16(38):1-205, iii-v. doi: 10.3310/hta16380.
3
PDF Entity Annotation Tool (PEAT).PDF实体注释工具(PEAT)。
J Open Source Softw. 2025 Apr 8;10(108):5336. doi: 10.21105/joss.05336.
4
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
5
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
6
Participation in environmental enhancement and conservation activities for health and well-being in adults: a review of quantitative and qualitative evidence.成年人参与促进环境改善和保护活动对健康与福祉的影响:定量和定性证据综述
Cochrane Database Syst Rev. 2016 May 21;2016(5):CD010351. doi: 10.1002/14651858.CD010351.pub2.
7
Radiology report generation using automatic keyword adaptation, frequency-based multi-label classification and text-to-text large language models.使用自动关键词适配、基于频率的多标签分类和文本到文本的大语言模型生成放射学报告。
Comput Biol Med. 2025 Jul 3;196(Pt A):110625. doi: 10.1016/j.compbiomed.2025.110625.
8
A systematic review of speech, language and communication interventions for children with Down syndrome from 0 to 6 years.对0至6岁唐氏综合征儿童言语、语言和沟通干预措施的系统评价。
Int J Lang Commun Disord. 2022 Mar;57(2):441-463. doi: 10.1111/1460-6984.12699. Epub 2022 Feb 22.
9
A systematic review and economic evaluation of epoetin alpha, epoetin beta and darbepoetin alpha in anaemia associated with cancer, especially that attributable to cancer treatment.促红细胞生成素α、促红细胞生成素β和达比加群酯治疗癌症相关性贫血(尤其是癌症治疗所致贫血)的系统评价与经济学评估
Health Technol Assess. 2007 Apr;11(13):1-202, iii-iv. doi: 10.3310/hta11130.
10
Accreditation through the eyes of nurse managers: an infinite staircase or a phenomenon that evaporates like water.护士长眼中的认证:是无尽的阶梯还是如流水般消逝的现象。
J Health Organ Manag. 2025 Jun 30. doi: 10.1108/JHOM-01-2025-0029.