Fighting misinformation: An embedded media provenance specification

Monday 4 October 2021, 16:51

Charlie Halford

Lead Architect - Content

Tagged with:

For the last few years, the BBC has had a project running in its technology division looking at technology solutions to various problems in the domain of news disinformation. Part of that effort, called Project Origin, is working to make it easier to understand where the news you consume online really comes from so that you can decide how credible it is. You can find some history on this in Laura Ellis' excellent "Project Origin: one year on" blog.

Part of Project Origin has been working in collaboration with major media and tech companies, most recently, with the Coalition for Content Provenance and Authenticity (C2PA), which it helped form. This group recently released a draft version of an embedded media provenance specification. This spec tackles the problem of missing, trusted provenance information in images / video / audio consumed on the internet. For example, where a video of elections in one country from 10 years ago is being presented as video from recent elections in another. This is an overview of how that specification is intended to work.

Embedding

The C2PA specification works primarily by defining mechanisms for embedding additional data into media assets to indicate their authentic origin. An essential aspect of this data is "assertions" - statements about when and where media was produced. The embedded information is then digitally signed so that a consumer knows who is making the statements.

While the C2PA specification also includes mechanisms for locating this provenance data remotely (e.g. hosted somewhere on the internet), I'll focus on the use case where all data is embedded directly in the asset itself.

Data model

The C2PA specification uses a few different mechanisms for embedding and storing data. Embedding is done with JUMBF, a container format, and structured data storage is done with a combination of JSON-LD and CBOR (which is a binary format based on JSON).

Container - The "Manifest Store"

Similar to XMP, the C2PA specification defines several embedding points in a selection of media formats to place a "Manifest Store" in JUMBF format, which is the container for the various pieces of provenance data. Once you've identified where and how a manifest store is embedded in your favourite media format, most of the specification is format-agnostic.

What is JUMBF?

JUMBF (JPEG universal metadata box format) is a binary container format initially designed for adding metadata to JPEG files, and it’s now used in other file formats too. It is structurally similar to the ISO Base Media box file format, an extensible container format that is used for many different types of media files. JUMBF "superboxes" are boxes that only contain other boxes. JUMBF "content type" boxes contain actual payload data, the serialisation of which should match the advertised content type of the box. All boxes have labels, which allow boxes to be addressed and understood when parsing. C2PA uses JUMBF in all the media formats it supports to provide the container format for the Manifest, Claims, Assertions, Verifiable Credentials and Signatures.

Each piece of embedded provenance data is called a “Manifest”. A manifest contains a part of the provenance data about the current asset, or the assets it was made from. Because an asset might have been created from multiple original sources or have been processed multiple times, we will often need to store several manifests to understand the complete history of the current asset.

Manifests are located in the "Manifest Store", which is a JUMBF superbox. The last manifest in the store is the "ActiveManifest", which is the provenance data about the current asset and it's the logical place for validation to start. The other manifests are the data for the "ingredients" of the active manifest - i.e. the assets that were a part of the creation of the active manifest. This is one of the key features of C2PA: each asset provides a graph of the history of editing and composition actions that went into the active asset, exposing as little or as much as the asset publisher wants.

Each manifest within the store is again its own JUMBF superbox. A manifest then consists of: a "Claim", an "AssertionStore", a "W3C Verifiable Credentials" and a "Signature". Manifests are signed by an actor (the “Signer”) whose credential identifies them to the user validating or consuming them.

Diagram of a Manifest box, without any VCs

Assertions

Assertions are the statements being made by the signer of a manifest. They are the bits of provenance data that consumers of that data are being asked to trust, for example, the date of image capture, the geographical location, or the publisher of a video.
In the spec, each assertion has its own data model. Some are published as "Standard Assertions" in the spec, some are adoptions of existing metadata specifications such as EXIF, IPTC and schema.org, and it is expected that implementers will extend the spec by defining their own as well.

Media metadata isn't new

For example, the EXIF standard is nearly universal in digital photographs, used to record location and camera settings. The fundamentally new thing that C2PA does is allow you to cryptographically bind that metadata (with hashes) to a particular media asset and then sign it with the identity credential of the origin of that data, ensuring that the result is tamper-proof and provable.

Assertions are contained in their own JUMBF Content Type Box in the assertion store superbox and are serialised in the format defined in the spec for that assertion. The C2PA-defined assertions are stored as CBOR, while most adopted assertions from other standards are JSON-LD.

Here's an example of an "Action" assertion (in CBOR Diag) which tells you what the signer thinks was done in creating the active asset:

{
  "actions": [
    {
    "action": "c2pa.filtered",
    "when": 0("2020-02-11T09:00:00Z"),
    "softwareAgent": "Joe's Photo Editor",
    "changed": "change1,change2",
    "instanceID": 37(h'ed610ae51f604002be3dbf0c589a2f1f')
    }
  ]
}

And here's an EXIF one (in JSON-LD) that contains location data:

{
  "@context" : {
    "exif": "http://ns.adobe.com/exif/1.0/",
  },
  "exif:GPSLatitude": "39,21.102N",
  "exif:GPSLongitude": "74,26.5737W",
  ...
}

The one critical assertion is the binding, something that binds the claim to an asset. In fact, the spec requires one. This ensures that claims are not applied to any asset other than the one they were signed against. This is important in helping to ensure that the consumer can trust that the C2PA data wasn't tampered with between the publisher and the consumer. There are currently two types of "hard bindings" available, a simple hash binding to an area of bytes in a file or a more complex one intended for ISO BMFF-based assets, which can use their box format to reference specific boxes that should be hashed.

Claim

The claim in a manifest exists to pull together the assertions being made, any "redactions" (removals of previous provenance data for privacy reasons), and some extra metadata about the asset, the software that created the claim, and the hashing algorithm used. Assertions are linked by their reference in the assertion store and a hash. The claim itself is another JUMBF box, serialised as a CBOR structure. This is the thing that is signed, and it provides a location to find the signature itself.

Signature

The signature in a manifest is a COSE CBOR structure that signs the contents of the claim box. COSE is the CBOR version of the JOSE framework of specs, which includes JWT/JWS. The signature is produced using the credentials of the signer. The signer is the primary point of trust in the C2PA Trust Model, and consumers are expected to use the signer's identity to help them make a trust decision on the claim's assertions.

The only currently supported credentials for producing the signature are x.509 certs. The specification provides a profile that certificates are expected to adhere to (including key usages such as “id-kp-emailProtection”, which is a placeholder). The specification does not include any requirements on how validators & consumers assemble lists of trusted issuers, as it is expected that an ecosystem of issuers will develop around this specification. Instead, it simply requires that validators maintain or reference such a list of trust anchors. Alternatively, they can put together a trusted list of individual entity certificates provided out-of-band of the trust anchor list.

What now?

This is an overview and omits both the detail required to produce C2PA manifests and the breadth of some of the other components of the specification (e.g. ingredients, the use of Verifiable Credentials, the concept of assertion metadata, timestamping etc). I'd love to produce a worked example of how to extract and validate a C2PA manifest from an asset; watch out for that in the future. I will highlight an open-source implementation of C2PA available in Python, and I know of other implementations in the works, too.

At the BBC, we can't wait for this specification to develop and gain adoption. We'd love to see it supported in production and distribution tools, web browsers, and on social media and messaging platforms. We really think it can make a difference to some of the harms done by mis- and disinformation.

Accessibility links

Fighting misinformation: An embedded media provenance specification

Embedding

Data model

What now?

More Posts

Previous

Streaming Euro 2020 live at a record new scale

Next

HTTPS is easy, just turn it on…