Reimagining image compression in the era of machine learning

Posted by Marta Mrak on 7 Jun 2021, last updated 7 Jun 2021

Image and video compression methods are the core technologies that enable digital broadcasting, streaming, as well as image and video sharing on social media.

Those compression methods have emerged over the last few decades from areas such as signal processing, algorithmic information theory and visual perception understanding. Building blocks of compression standards, such as JPEG, MPEG-2, H264/AVC and their successors, were carefully crafted by many experts around the globe.

Reimagining image compression in the era of machine learning

Over the last five years, researchers have started rethinking compression as a computer vision problem, building new solutions with machine learning, particularly using versatile deep learning modules. In this way, image and video coding can be achieved using end-to-end neural network architectures. The main advantage of this approach is that pixels are first passed through neural network encoders to obtain content representation in a latent space, which is then easier to compress. To achieve that, such encoders apply more complex nonlinear transforms, which have data-driven learned parameters that explicitly optimised to minimise rate and distortion in an end-to-end fashion.

What we are doing

So far, the field of neural image and video compression research has focused on improving compression and reducing memory and complexity requirements. However, since neural compression methods are completely different from traditional video codecs, this emerging approach provides an opportunity to experiment with new codec functionalities and rethink what is needed for the next generation of image and video codecs.

A composite image showing the outputs of this method of compression

One of the key conceptual differences between traditional coding and neural network coding is that the codec architecture allows much more freedom when using neural codecs. Since both encoder and decoder models are learned (meaning they can be re-trained), the same encoder or decoder architecture can be specialised for different tasks. This can lead to better compression in situations where there is a distinct type or theme of content (i.e. a specific domain such as cityscapes, as in the picture below). It also implies that different applications can potentially customise a decoder in an end-user device

Our approach

BBC Research & Development, alongside researchers from the Computer Vision Center in Barcelona, Spain, is studying how to design a neural codec that can be partially re-trained to allow the codec to be easily reusable across different domains. The image below shows an overview of our approach. We ensured that all re-trained specialised codecs (g₂, f₂) would still be able to correctly decode images compressed with the baseline (g₁, f₁) setting (without re-trained parameters, as indicated by the red modules). This is required for backward compatibility as all such decoders in legacy devices could still decode the baseline profiles. In the picture below, this property is shown with the ability of the re-trained decoder (g₂) to decode the bitstream generated by the baseline encoder (f₁).

A diagram highlighting the process described above.

Using this approach, we introduced a new concept called DANICE - Domain Adaptation in Neural Image Compression. DANICE optimises a neural image codec for a specific content type using only light training to derive a small set of custom decoder parameters. The approach assumes that a decoder architecture remains fixed. To ensure backward capability, our method does not require changes to the basic set of parameters, which are instead shared across the various instances of the decoder. Our initial findings show that this new codec can adapt to new custom domains while preserving backward compatibility, resulting in enhanced compression for certain types of content .

You can read more about this work in our paper entitled DANICE: Domain adaptation without forgetting in neural image compression (at CLIC 2021 as part of CVPR 2021). A short technical overview is also available.

What next?

Our current work focuses on the compression of still images. The next step will be to expand this work and work out ways to use this in the compression of video.

The research presented here was created in collaboration with Sudeep Katakol, Fei Yang and Luis Herranz at Computer Vision Center, UAB, Barcelona, Spain.

Tweet This - Share on Facebook

BBC R&D - Video Coding

BBC R&D - COGNITUS

BBC R&D - Faster Video Compression Using Machine Learning

BBC R&D - AI & Auto Colourisation - Black & White to Colour with Machine Learning

BBC R&D - Capturing User Generated Content on BBC Music Day with COGNITUS

BBC R&D - Testing AV1 and VVC

BBC R&D - Turing codec: open-source HEVC video compression

BBC R&D - Comparing MPEG and AOMedia

BBC R&D - Joining the Alliance for Open Media

This post is part of the Distribution Core Technologies section

Accessibility links

Research & Development

Reimagining image compression in the era of machine learning

What we are doing

Our approach

What next?

Topics