Folders are not metadata

Why we need a sidecar for offloads.

Folders are not metadata

Why we need a sidecar for offloads.

A slow but gradual change is taking place in video production: organizing media on a file level rather than at a folder level. It’s basically the Gmail innovation: instead of relying on folders to sort email, simply write better emails and the search algorithm will reward you for it.

However, when offloading we’re still using a lot of deep folder structures. This is not uncommon:

Disk/Year/Project/Day/Time/Location/Subject/Session/CameraType/Clips/…

But folders have a flaw: they only have a name. Folders themselves don’t have substance, just content. Move the content out and the folder becomes useless. We’re essentially misusing folder names as tags, without attaching them to the actual footage.

So, folders by themselves don’t suffice. But files don’t either: filenames aren’t unique at all. The only reliable way to track files throughout a production is to add metadata. But burning in metadata during offload will change the checksums — and slows down the offload considerably. So we need something that doesn’t alter files, doesn’t rely on folders, and has the ability to add context too over time. It also shouldn’t be in the way of software like NLEs or MAMs.

In short, it…

  • can contain transfer metadata
  • can contain clip metadata
  • doesn’t alter the source material
  • is extensible during production

Basically, a sidecar file for offloads.

A Sidecar Named MHL

Alongside every transfer, Hedge creates a Media Hash List. Originally devised by Pomfort, this open standard XML is not just useful for keeping a record of files and their hashes; it can contain a lot more information about your project and its context. It’s a sort of hybrid between clip-based and transfer-based metadata.

The great thing about MHLs is that they’re essentially snapshots. You can have multiple MHLs in the same folder, describing the state of your files through time. When adding metadata to MHLs you create a history of what happened when, e.g. xxHash checksums on offload, extracted metadata directly afterwards, verification on ingest, creation of additional md5 checksums later on. Even if and when a file was moved or even deleted is traceable. It’s a history overview.

We propose to use MHLs as the sidecar format for offloads, turning it into something more than just a hash list. This way every file is always ready for ingest-ready, without being obtrusive for existing workflows.

Searchin’

Even if you’re never going to use a MAM or other media management solution, you can still benefit from MHL as a sidecar. Searching through disks and inside files is way faster than manually drilling down folder structures in Finder. With Spotlight or apps like Alfred (our favorite!) you effectively have a basic MAM at your disposal, allowing you to go with less deep folder structures when offloading.

Using search this way is great for digging into your local archive of Transfer Logs. Now your logs suddenly become useful for more than just CYA 😉

UPDATE 2017/02/15We now have two free Spotlight and QuikLook plugins for .mhl files available for download here.

A real life example

Camille Darley, CTO of Productions Autrement Dit (pad.fr), came to us with the need of adding metadata to a transfer:

There is a feature we can’t find in any offloading software which would significantly improve our workflow: the ability to add some data when offloading media, like tags, a description…

After deliberation we added a menu item to the Source Disk menu:

Hedge’s Disk Menu

Add Info… is a rudimentary way of adding context to a Source before offloading its contents.

Anything you add to it, finds its way into the Transfer Logs:

Source: /Volumes/SD/
Destination: /Users/Hedge/Documents/Media/SD/

Started: 13/01/2017, 12:25:24
Finished: 13/01/2017, 12:25:24
Duration: 0.4 seconds

Info: 2016;Project;Basically;Anything;Goes

Total Files: 8
Total Size: 83,9 MB (83886080 Bytes)
Status: Success

Inside the MHLs you’ll find Transfer Info in the top node, for it is transfer metadata, not clip metadata.

<hedge>
   <rootPath>/Users/Hedge/Documents/Media/SD/</rootPath>
   <info>2016;Project;Basically;Anything;Goes</info>
</hedge>

Camille uses Transfer Info to make it easy for his DOPs to add context to their shots without having to worry about the MAM Camille will be using back at the office. It’s cross platform, and software independent.

We ask our cameramen to fill in the Transfer Info like this:
>[tags]bla, bla, bla
>[description]This is the description of the clips
When they return a backup disk to us, we ingest it automatically into our MAM with a custom script I wrote to retrieve the Transfer Info. The XML format the MHL uses is structured so it’s very easy to process further.This is part of the Python script I wrote to ingest MHLs into our MAM:
import os
from lxml import etree
from MAM import ingest_file

root_path = "path_to_media_folder"

if os.path.isdir(root_path):
    xml_file = ""
    # Search an hedge MHL file
    for file in os.listdir(root_path):
        if file.endswith(".mhl"):
            xml_file = file
    
    if xml_file is "":
        print 'No MHL file found'
    
    else:
        # Parse the MHL file
        tree = etree.parse(os.path.join(root_path,xml_file))
        root = tree.getroot()
        
        # Xpath to the infos metadata
        xpath_infos = "/hashlist/hedge/info"
        
        #retrieve the raw info value
        infos_elements = root.xpath(xpath_infos)
        infos_string = infos_elements[0].text
        
        metadatas = {}
        
        # Split raw value with the first delimiter
        infos = infos_string.split('[')
        # Clean empty values
        infos = [x for x in infos if x]
        
        # For eache values, split again to retrieve a key/value pair, then assign it to the metadatas dictionary
        for info in infos:
            key, value = info.split(']', 2)
            metadatas[key] = value
        
        # Xpath to the files paths
        xpath_files = "/hashlist/hash/file"
        files_elements = root.xpath(xpath_files)
        
        for files_element in files_elements:
            # Ingest file in the MAM
            filepath = os.path.join(root_path, files_element.text)
            ingest_file(filepath, metadatas)

else :
    print 'This is not a valid path'
With this workflow, it becomes very easy for me to ingest data without forcing my DOPs to use complicated software during their work.

So, easy for both parties and a lot of context saved for later 👍

To metadata or not to metadata

Of course you should use folders to 0rganize your media, but never just for the sake of adding metadata. Two or three levels deep folder should suffice. For the rest, use tags or labels.

Accumulating context in a MHL as sidecar gives you a flexible, future proof tool. It also alleviates the need to extract metadata on ingest.

Since MHLs don’t conflict with each other, you can easily create multiple MHLs with different information inside whenever it’s needed. This way, you can focus on the bottleneck of your production: offloading.