Research & Development

Posted by Michael Armstrong, Matthew Brooks on , last updated

This is the second of three blog posts which all feature short papers on topics related to subtitling.

The three papers were presented as part of the industrial track of TVX 2014 conference on interactive experiences for television and online video, which took place in Newcastle in June this year.

This paper explores the issues around improving subtitle placement and describes some early user research work which investigated the impact of dynamic subtitle placement on the eye gaze of the viewer.


Enhancing Subtitles

Authors: Matthew Brooks and Mike Armstrong

Abstract

Television programmes are increasingly consumed on small devices like tablets and mobile phones, and ever-larger television sets. Different subtitle layouts may be needed for different screens, and the graphical capabilities of modern devices create opportunities for enhancing subtitles by integrating them with the moving image, and enabling choice in subtitle size and style.

We have used browser technology to enable the dynamic positioning of subtitles and have carried out proof-of-concept user research using eye gaze tracking equipment, which suggests that placing subtitles where people look in a scene can improve the viewing experience and reduce the number of eye movements. We will be using a similar approach to investigate the impact of different sizes and layouts of subtitles on tablets and mobile devices.

Dynamic subtitle positioning and variable size subtitles can only work if the subtitles are controlled to prevent them obscuring important parts of the image. Our work will address these issues and look for automatic processes that can help find the best locations to place subtitles in a television programme. 

Introduction

Subtitles for television are often seen as an afterthought: produced at the end of the production chain, often just before broadcast, they are mostly detached from the creative process, and their format is still bound by the display technology of the 1980’s [1,2]. However, programme makers and academics are using subtitles in a creative fashion [3], making burnt-in subtitles that work with the picture, using positioning, transitions and movement to create empathy between subject matter and subtitle. 

The BBC television documentary “Human Planet” (Figure 1) used positioned subtitles for translation, and similar techniques are also being applied to storytelling - another BBC television drama “Sherlock” (Figure 2) used positioned on-screen text to reveal inner thought processes and display text messages on phones in a highly creative and engaging manner.

This creative use of subtitling comes at a time when television is increasingly being consumed on smaller displays like tablets and mobile phones, and ever-larger television sets. The graphical capabilities of modern devices create opportunities for enhanced subtitle presentation, but the proliferation of screen sizes creates problems: subtitles on small screens can suffer readability issues, and subtitles on large screens require more eye movement when moving between a traditionally positioned subtitle and the area of interest in the image.

 

Figure 1. Positioned Subtitles in “Human Planet”

 

Figure 2. On-screen text with narrative content in “Sherlock”

 

Producing positioned subtitles requires an understanding of where best to place them relative to where viewers are likely to be looking. There are also areas within the image which mustn’t be obscured - for example, faces, news captions, and regions carrying a visual narrative. We need to understand where these areas are in order to avoid obscuring them with dynamically placed subtitles.

Research Focus

We are currently looking at three main research areas: Dynamic Subtitle Placement, Feature Avoidance and Screen Size.

Dynamic Subtitle Placement

We would like to understand where to place subtitles relative to the primary area of interest in the image, in order to minimise both the eye travel distance from the area of interest to the subtitle, and the number of repeat visits to it. Our hypothesis is that by reducing distance and number of visits, the viewer will read the subtitle with less effort.

An individual subtitle position optimised in this way would improve the viewing experience - but a continuous sequence of these subtitles has no spatial coherence, unlike traditional subtitles, which appear in a predictable location. We also need an understanding of how best to achieve a compromise between the ideal, individual subtitle position, and a predictable temporal-spatial flow between subtitles.

 

Figure 3. Eye gaze visualisation showing fixations for the same subtitle
in traditional and positioned locations.

 

Feature Avoidance

We envisage a system that could guide a subtitler producing dynamic subtitles, automatically marking up areas to be avoided, or prime candidates for placement. With sufficient analysis capabilities it is possible such a system could automatically produce a first pass of subtitle positions - for example, combining speaker recognition technology with subtitle and script data could enable us to match subtitles to face positions.

A stream of video feature data will also allow us to produce “responsive subtitles” - subtitles that can be resized and reformatted on the fly in response to device orientation, screen size and user preference, without obscuring important features.

Screen Size

Now that television and video clips are being consumed on a wider variety of screens sizes, guidelines on subtitle sizes, fonts, backgrounds and positions will need reconsideration.

Our audience will need to be able to adjust the compromise between the legibility of the subtitle and the amount of the image obscured. With that will come the need to guide the positioning of the subtitles away from the important parts of the image. A stream of precalculated video feature data  will enable a better experience for the viewer.

Prototype Work

We have built a positional subtitle editor in HTML5 and JavaScript that reads popular subtitle formats and allows them to be positioned and saved in our own intermediate format. A separate analysis layer feeds the editor with simple image analysis, scoring areas of the image based on how much they change, and whether they contain an important feature such as a face or on-screen graphics. Our positioned subtitle format holds the bounding box for the subtitles, allowing us to easily cross-reference positioned subtitles with exported gaze tracking data from our user studies, which can be visualised in a separate application. 

User Studies

Our user studies so far have concentrated on dynamic subtitle positioning. A 2x2 repeated measures design was employed (foreign language / native  language) x (traditional / positioned), with participants watching four different 90 second clips counterbalanced across viewing conditions. Figure 3 is a still from our eye gaze visualisation application. The blue dots show the fixations of those watching a foreign-language dub with positioned subtitles, and the green dots show fixations for the same dub with traditional subtitles. We found that participants took longer to fixate on positioned subtitles, perhaps because they were unsure where they would appear. However, once they had fixated on the positioned subtitle, they spent less time reading it than its traditional equivalent, and more time looking at the image. This could reasonably be interpreted as participants spending more time watching the drama. In turn this is likely to reflect a more enjoyable, immersed viewing experience. 

We intend to carry out further work to verify the significance of these findings, and improve our understanding of what conditions make a positional subtitle outperform its traditional counterpart.

References

  1. Newell, A. F. "Teletext for the deaf." Electronics & Power 28.3 (1982): 263-266.
  2. Evans, M.J. “Speech Recognition in Assisted and Live Subtitling for Television” BBC R&D WHP065 (2003)
  3. McClarty, R. “Towards a multidisciplinary approach in creative subtitling” MonTI 4(2012: 133-153) 


Topics