• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

文本分段差异对歌曲歌词自动定量评估的影响。

The impact of differences in text segmentation on the automated quantitative evaluation of song-lyrics.

机构信息

School of Humanities, Massey University, Palmerston North, New Zealand.

School of Fundamental Sciences, Massey University, Palmerston North, New Zealand.

出版信息

PLoS One. 2020 Nov 9;15(11):e0241979. doi: 10.1371/journal.pone.0241979. eCollection 2020.

DOI:10.1371/journal.pone.0241979
PMID:33166329
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7652311/
Abstract

The text-evaluation application Coh-Metrix and natural language processing rely on the sentence for text segmentation and analysis and frequently detect sentence limits by means of punctuation. Problems arise when target texts such as pop song lyrics do not follow formal standards of written text composition and lack punctuation in the original. In such cases it is common for human transcribers to prepare texts for analysis, often following unspecified or at least unreported rules of text normalization and relying potentially on an assumed shared understanding of the sentence as a text-structural unit. This study investigated whether the use of different transcribers to insert typographical symbols into song lyrics during the pre-processing of textual data can result in significant differences in sentence delineation. Results indicate that different transcribers (following commonly agreed-upon rules of punctuation based on their extensive experience with language and writing as language professionals) can produce differences in sentence segmentation. This has implications for the analysis results for at least some Coh-Metrix measures and highlights the problem of transcription, with potential consequences for quantification at and above sentence level. It is argued that when analyzing non-traditional written texts or transcripts of spoken language it is not possible to assume uniform text interpretation and segmentation during pre-processing. It is advisable to provide clear rules for text normalization at the pre-processing stage, and to make these explicit in documentation and publication.

摘要

文本评估应用程序 Coh-Metrix 和自然语言处理依赖于句子进行文本分割和分析,并经常通过标点符号来检测句子的界限。但是,当目标文本(如流行歌曲歌词)不符合书面文本组成的正式标准并且在原文中缺乏标点符号时,就会出现问题。在这种情况下,人类转录员通常会根据未指定的(或者至少没有报告的)文本规范化规则来准备用于分析的文本,并可能依赖于对句子作为文本结构单元的共同理解。本研究调查了在文本数据的预处理过程中,不同的转录员在歌词中插入标点符号是否会导致句子划分的显著差异。结果表明,不同的转录员(根据他们作为语言专业人士的丰富语言和写作经验,遵循常见的标点符号规则)可能会在句子分割方面产生差异。这对至少某些 Coh-Metrix 指标的分析结果产生影响,并突出了转录问题,这可能会对句子级别及以上的量化产生影响。有人认为,在分析非传统书面文本或口语转录时,不可能在预处理过程中假设统一的文本解释和分割。建议在预处理阶段提供明确的文本规范化规则,并在文档和出版物中明确说明这些规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/defc/7652311/71179d0a954a/pone.0241979.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/defc/7652311/7045442d6ae7/pone.0241979.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/defc/7652311/bef0231fdde5/pone.0241979.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/defc/7652311/71179d0a954a/pone.0241979.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/defc/7652311/7045442d6ae7/pone.0241979.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/defc/7652311/bef0231fdde5/pone.0241979.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/defc/7652311/71179d0a954a/pone.0241979.g003.jpg

相似文献

1
The impact of differences in text segmentation on the automated quantitative evaluation of song-lyrics.文本分段差异对歌曲歌词自动定量评估的影响。
PLoS One. 2020 Nov 9;15(11):e0241979. doi: 10.1371/journal.pone.0241979. eCollection 2020.
2
Coh-metrix: analysis of text on cohesion and language.Coh - metrix:衔接与语言文本分析
Behav Res Methods Instrum Comput. 2004 May;36(2):193-202. doi: 10.3758/bf03195564.
3
Impact of background music on reading comprehension: influence of lyrics language and study habits.背景音乐对阅读理解的影响:歌词语言和学习习惯的作用
Front Psychol. 2024 Apr 5;15:1363562. doi: 10.3389/fpsyg.2024.1363562. eCollection 2024.
4
Rules to be adopted for publishing a scientific paper.发表科学论文应采用的规则。
Ann Ital Chir. 2016;87:1-3.
5
Detection of sentence boundaries and abbreviations in clinical narratives.临床叙述中句子边界和缩写的检测。
BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S4. doi: 10.1186/1472-6947-15-S2-S4. Epub 2015 Jun 15.
6
Songs tell a story: The Arc of narrative for music.歌曲讲述故事:音乐叙事的弧线。
PLoS One. 2024 May 16;19(5):e0303188. doi: 10.1371/journal.pone.0303188. eCollection 2024.
7
Please don't stop the music: Song completion in patients with aphasia.请别停止音乐:失语症患者的歌曲补全任务
J Commun Disord. 2018 Sep-Oct;75:72-86. doi: 10.1016/j.jcomdis.2018.06.005. Epub 2018 Jun 22.
8
Tracking emotions from song lyrics: Analyzing 30 years of K-pop hits.从歌曲歌词中追踪情感:分析 30 年的 K-pop 热曲。
Emotion. 2023 Sep;23(6):1658-1669. doi: 10.1037/emo0001185. Epub 2022 Nov 10.
9
Music training and rate of presentation as mediators of text and song recall.音乐训练与呈现速度作为文本和歌曲回忆的中介因素
Mem Cognit. 2000 Jul;28(5):700-10. doi: 10.3758/bf03198404.
10
An unsupervised machine learning approach to segmentation of clinician-entered free text.一种用于对临床医生录入的自由文本进行分割的无监督机器学习方法。
AMIA Annu Symp Proc. 2007 Oct 11;2007:811-5.

本文引用的文献

1
Evidence of disturbances of deep levels of semantic cohesion within personal narratives in schizophrenia.精神分裂症患者个人叙事中深层语义连贯性紊乱的证据。
Schizophr Res. 2018 Jul;197:365-369. doi: 10.1016/j.schres.2017.11.014. Epub 2017 Nov 16.
2
When are tutorial dialogues more effective than reading?教程对话在什么时候比阅读更有效?
Cogn Sci. 2007 Feb;31(1):3-62. doi: 10.1080/03640210709336984.
3
Automatic measurement of propositional idea density from part-of-speech tagging.基于词性标注的命题思想密度自动测量。
Behav Res Methods. 2008 May;40(2):540-5. doi: 10.3758/brm.40.2.540.
4
AutoTutor: a tutor with dialogue in natural language.自动辅导器:一个具有自然语言对话功能的辅导器。
Behav Res Methods Instrum Comput. 2004 May;36(2):180-92. doi: 10.3758/bf03195563.