Papalampidi Pinelopi, Keller Frank, Lapata Mirella
IEEE Trans Pattern Anal Mach Intell. 2024 Jan;46(1):292-304. doi: 10.1109/TPAMI.2023.3323030. Epub 2023 Dec 5.
Movie trailers perform multiple functions: they introduce viewers to the story, convey the mood and artistic style of the film, and encourage audiences to see the movie. These diverse functions make trailer creation a challenging endeavor. In this work, we focus on finding trailer moments in a movie, i.e., shots that could be potentially included in a trailer. We decompose this task into two subtasks: narrative structure identification and sentiment prediction. We model movies as graphs, where nodes are shots and edges denote semantic relations between them. We learn these relations using joint contrastive training which distills rich textual information (e.g., characters, actions, situations) from screenplays. An unsupervised algorithm then traverses the graph and selects trailer moments from the movie that human judges prefer to ones selected by competitive supervised approaches. A main advantage of our algorithm is that it uses interpretable criteria, which allows us to deploy it in an interactive tool for trailer creation with a human in the loop. Our tool allows users to select trailer shots in under 30 minutes that are superior to fully automatic methods and comparable to (exclusive) manual selection by experts.
它们向观众介绍故事,传达电影的氛围和艺术风格,并鼓励观众观看电影。这些多样的功能使得预告片制作成为一项具有挑战性的工作。在这项工作中,我们专注于在一部电影中找到预告片片段,即有可能被纳入预告片中的镜头。我们将这项任务分解为两个子任务:叙事结构识别和情感预测。我们将电影建模为图,其中节点是镜头,边表示它们之间的语义关系。我们使用联合对比训练来学习这些关系,该训练从剧本中提炼出丰富的文本信息(例如,角色、动作、场景)。然后,一种无监督算法遍历该图,并从电影中选择预告片片段,人类评委更喜欢这些片段,而不是竞争的监督方法所选择的片段。我们算法的一个主要优点是它使用可解释的标准,这使我们能够将其部署在一个交互式工具中,用于在有人参与的情况下进行预告片制作。我们的工具允许用户在不到30分钟的时间内选择预告片镜头,这些镜头优于全自动方法,并且与专家的(独家)手动选择相当。