文本能被赋予超越其字面意义的声调吗? (2022)
Can text be made to sound more than just its words? (2022)

原始链接: https://arxiv.org/abs/2202.10631

这项研究探讨了如何在文本字幕中传达说话的*方式*——语调(响度、音高、时长),而传统字幕通常只呈现*说什么*。作者提出了一种模型,将声音的细微差别转化为排版变化:字体粗细表示响度,基线偏移表示音高,字母间距表示时长。 他们测试了观众是否仅根据显示这些视觉“语音调节”的文本,就能准确识别原始音频。来自117名参与者的结果显示,无论文本是静态的还是动画的,将排版与源音频匹配的准确率均为65%。 这项研究强调了视觉丰富的字幕改善理解力的潜力,但也揭示了对这些语音调节排版线索的不同解读,表明还需要进一步完善。最终,这项工作旨在使字幕更完整地呈现口头交流。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 文本能否听起来比其字面意思更多? (2022) (arxiv.org) 4 分,作者 tobr 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

[Submitted on 22 Feb 2022]

View a PDF of the paper titled Hidden bawls, whispers, and yelps: can text be made to sound more than just its words?, by Calu\~a de Lacerda Pataca and Paula Dornhofer Paro Costa

View PDF
Abstract:Whether a word was bawled, whispered, or yelped, captions will typically represent it in the same way. If they are your only way to access what is being said, subjective nuances expressed in the voice will be lost. Since so much of communication is carried by these nuances, we posit that if captions are to be used as an accurate representation of speech, embedding visual representations of paralinguistic qualities into captions could help readers use them to better understand speech beyond its mere textual content. This paper presents a model for processing vocal prosody (its loudness, pitch, and duration) and mapping it into visual dimensions of typography (respectively, font-weight, baseline shift, and letter-spacing), creating a visual representation of these lost vocal subtleties that can be embedded directly into the typographical form of text. An evaluation was carried out where participants were exposed to this speech-modulated typography and asked to match it to its originating audio, presented between similar alternatives. Participants (n=117) were able to correctly identify the original audios with an average accuracy of 65%, with no significant difference when showing them modulations as animated or static text. Additionally, participants' comments showed their mental models of speech-modulated typography varied widely.
From: Caluã De Lacerda Pataca [view email]
[v1] Tue, 22 Feb 2022 02:35:25 UTC (1,948 KB)
联系我们 contact @ memedata.com