Streaming player shakedowns: Our setup to test internet streaming

Posted by Chris Bass on 18 Oct 2021, last updated 19 Oct 2021

One of the standard pieces of equipment used by motor racing teams to develop their cars nowadays is a chassis dynamic rig (also known as a shaker or seven poster rig). It is a mechanical marvel on which a racing car sits and which emulates the experience of driving on a specific racetrack by applying forces to the wheels and chassis of the vehicle via a number of hydraulic actuators, using data previously captured from a car driving around that track. This allows teams to test how well the vehicle and its components handle those forces and enables them to tune the car for a particular circuit before they even get there.

Having a chassis dynamic rig is of great benefit to a racing team. It means that they can evaluate new components and setups without going to the expense of transporting a vehicle and team of engineers to an actual circuit, cutting costs, environmental impact and development time. It means that they can make the best use of time, running tests on cars 24/7 and without any risk to a driver.

Streaming player shakedowns: Our setup to test internet streaming

In the world of internet streaming players, having the equivalent of a chassis dynamic rig would be similarly useful: a rig that allows us to evaluate how players will react to the kind of network conditions faced by our audiences; a tool that enables us to quickly measure the likely impact of changes to our player software or streams, without the risks associated with testing in iPlayer. That, in essence, is what our DASH Player Testbed provides. Later in this article, I’m going to describe the architecture of our Testbed. But first, what are some of the particular challenges of implementing a test rig for internet streaming?

Recreating network conditions

The physical conditions that racing teams need to recreate in their chassis rigs are pretty well defined: they race on a small number of known circuits and have detailed data for each one. In the world of internet streaming, however, the range of network conditions faced by users is much broader and the specific circumstances under which a programme will be streamed are not known in advance. The complex interactions of all the devices and links that make up the internet mean it’s unlikely that any two streaming sessions will be exactly the same. So how can we decide what network conditions to apply within our Testbed to simulate those faced by our audiences?

To determine this, we captured detailed metrics about how the download rate of audio and video data varied over time for around 380,000 BBC iPlayer sessions. The download rates for each streaming session were captured at the so-called application level, i.e., they represent the data throughput available to the player software that was decoding and presenting the stream. From this information we derived a set of around ten network profiles representing a range of conditions faced by our audiences. Each of these network profiles is encapsulated in a file describing how available bitrate varies over time. Here is the throughput graph of one of these profiles.

Network throughput against time for a Testbed network profile.

Network throughput against time for a Testbed network profile

Having network profiles is one thing. How, though, do we recreate the network conditions described in a profile and apply them to a player streaming some content? What is the equivalent in the Testbed of the actuators that apply physical forces to a car on a chassis dynamic rig?

We use a technique called network emulation, in which the characteristics of a real network – its data rate, latency and/or packet loss – are altered to mimic those of a different network. In the Testbed we use the traffic control facilities built into the Linux operating system, which allow the behaviour of its network interfaces to be controlled. We use these to modulate the throughput of the network interface used by the player software according to the particular network profile being used for a test. Note that this method doesn’t precisely recreate the conditions existing when the network profile was captured but it is sufficiently representative to allow meaningful comparisons to be made across multiple tests.

Handling non-determinism

One of the challenges of drawing valid conclusions about player performance is that the sheer complexity of networks, protocols, connected devices and the media player software that runs on them, together with all of the interactions between these components, means that the system as a whole is unlikely to behave in exactly the same way twice. Though general patterns of behaviour are evident, internet streaming is fundamentally non-deterministic.

Given that, how can we confidently conclude that, for instance, one version of a player is better than another? The answer is that we need to run many sessions of each test, i.e., of each specific combination of player, stream and network profile. Then we can calculate the variance of the results and determine the significance of performance differences between player versions. To make this manageable, the Testbed automates the running of many sets of tests and parallelises the execution of test sessions across multiple machines to reduce the overall time taken to run them.

Let’s take a look next at the overall architecture of the Testbed.

Testbed architecture

Simplified architecture of the Testbed (Click for larger version)

The TestController, which sits at the heart of the Testbed, is responsible for queueing and running sets of tests. The UI submits a test set to the TestController in a JSON file, which describes each test to be run in terms of the following:

The stream to use in the test: whether it’s a live or on-demand stream, what combination of audio/video/subtitle components it has and its duration.
Which DASH player software to use in the test and how that player should be configured. The Testbed currently supports players based around the dash.js JavaScript DASH implementation and the GStreamer media toolkit.
The number of sessions of that test to run (to give statistically useful results – see above).
The network profile that should be applied during the running of each test session.

When the TestController runs a test from within a test set, it farms out the individual sessions of that test to a pool of Executors, which are each responsible for running a single session at a time. Each of the Executors is a Linux machine. As described above, we use the traffic control facilities built into the Linux kernel to throttle the data rate of its network interface according to the particular network profile selected for that test. The Executor configures and launches the player identified in the test description, which will play the test stream in real time and report metrics about its performance back to the MetricStore.

The MetricStore which captures player metrics implements a GraphQL interface that allows those stored metrics to be accessed. This allows graphing clients to visualise captured data, allowing the performance of different player software under the same network conditions to be compared, for example, or the effects of adding a new video component under different network conditions to be seen. The GraphQL interface also allows non-graphical clients to extract useful information from the captured data: it could be used, for example, to access session data on which new player software can be trained using machine learning techniques.

As well as helping us optimise the reliability of our current streams, the Testbed is helping us develop future improvements in online streaming, such as low latency live streaming. Low latency streaming, in which the time lag when streaming live events is brought down to a level similar to that of traditional broadcast delivery, presents more challenges to players than regular streaming. It’s more difficult to measure how much bandwidth is available to a player during low latency streaming, and the trade-off of playing with a low latency is that a player can buffer only a small amount of media ahead into the future, meaning it has a lot less safety margin to react to downturns in network bandwidth and avoid stalling. This remains an area of active research that we will cover in upcoming posts in this series.

Just as automated testing tools are of great value in the world of motor racing, so they are in the field of internet streaming. The DASH Player Testbed is just one tool that is helping us reach the goal of giving audiences of our internet-delivered content the same or better experience than they get from our broadcast content.

Tweet This - Share on Facebook

T3 - The BBC has found a clever way to make iPlayer streaming even better quality

BBC R&D - Adaptive Bitrate Technology

BBC R&D - Low Latency Live Streaming with MPEG DASH

BBC R&D - A Summer of Football and Tennis in Ultra HD

Wikipedia - Over The Top Media Services

Wikipedia - Dynamic Adaptive Streaming over HTTP

BBC R&D - A New View of the Weather: Forecaster5G, our Object-Based Weather Report

BBC Technology + Creativity - Broadcasting the World Cup and Wimbledon in UHD

BBC Technology + Creativity - Inside our UHD workflow

Accessibility links

Research & Development

Streaming player shakedowns: Our setup to test internet streaming

Recreating network conditions

Handling non-determinism

Testbed architecture

Topics