Meta-Metadata: Turning Three Text Fields Into Millions of Dollars

As the number of data records music companies continues to grow, the value of efficiency has continued to increase, particularly given that each record contains only a few points of data. To a keen observer, there are deep, financially beneficial possibilities in this track-related information.

_______________________________

Guest post by Jesse Buddington, Director of Licensing Operations at Loudr

Leaping from a few hundred thousand to trillions of lines in a few short years, the number of data records music companies must process has grown exponentially. Efficiency and making more from less have become essential – especially when each record in the incoming flood often only includes three to four points of identifying data.

A sharp eye (or sharply-designed system) can see deeper possibilities in this paucity of track-related information. We need to home in on context and trends – a sort of “meta-metadata” – that allow us to parse the ever-increasing onslaught of records quickly and accurately. Meta-metadata clues can lead to conclusive attributions and thus into payment. Thus three to four text fields, approached correctly and at scale, can translate into millions of unclaimed royalties.

First, let’s define metadata in the music industry. All properly-tracked operations generate logs, and thus all operations generate metadata. “Metadata” is the term we use to define text information that accompanies other types of content (often files, logs, processes, and sometimes-confusingly even text data itself). Working at scale relies on operations happening in bulk, which requires an understanding of trends within a data set. Because file data is often large and hard to parse, text-based metadata can provide a wealth of information to the inquisitive.

Interpreting metadata correctly is key to identifying trends. By doing this, we can determine how to handle items that follow a given trend. Trends themselves can be handled in bulk if the rules to govern them are accurate to all items following a particular trend, allowing a great deal of records to be processed by an automated system. Human beings are excellent at detecting patterns, but trend-handling can be broken down into its component parts by an automated system, which can often augment the pattern-finding abilities of a human operator. Ultimately, we strive for our automated systems to be able to make intuitive inferences as well as a knowledgeable human being can, and to act upon those inferences at scale.

Most services under-use their metadata, and are often unaware of just what their metadata can tell them. In the music industry, a huge amount of metadata goes unused because it cannot be understood in-context at scale. For example, consider the following sound recording:

■ Track Title: “Kapitel 3: Gespräch mit den Königen (Teil 2)”

■ Artist: “Friedrich Nietzsche & Thomas Gehringer”

■ Album Title: “Also sprach Zarathustra (Vierter Teil)”

Most music services have these three data points, as well as Label/Provider, but often nothing else. However even in this limited metadata lies a wealth of information. First of all, the language of the three items appears to be German. When searching for, for example, the performing rights society or likely composer of the underlying musical work (information not included in this metadata set), we can immediately prioritize looking through German databases, most prominently GEMA (Germany’s official society for performing and mechanical reproduction rights).

Let’s dig a bit deeper into the translated information:

■ Track Title: Chapter 3: Conversation with the Kings (Part 2)

■ Album Title: Thus spake Zarathustra (Fourth Part)

What about the artists’ names? We can tell that the words are likely names, rather than the dictionary-words expected in a band name, so we’ll leave them untranslated. Friedrich Nietzsche is a famous philosopher – though it is always possible that his writings were set as song lyrics. Thomas Gehringer isn’t a household name, but he shows up across our data set. Gehringer is the artist on various recordings, always with another artist named who is always a famous book author. It is of course possible that he is a musician, setting their many book texts to music, but far more probable that he is the narrator of a series of audiobooks. Looking back at our translated track title, the formatting of the title (beginning with “Chapter 3”) further supports the conclusion that this track belongs to an audiobook.

Even without listening to the track audio or processing that much-larger data source, we can be reasonably confident based on the metadata alone that this track is a non-musical work, and further that it is audiobook narration. By noting the trend and keeping an eye out for data like “Kapitel [#]” and “Thomas Gehringer” in the future, we can help reinforce the conclusion that he is an audiobook author by looking at other sound recordings in our system, then processing his works accordingly.

On the flip side, rightsholders are also often unaware of what data they have. This is sometimes a metadata problem, but also often a problem of not recognizing the formats and rights available within composite works. For example, a video game studio often considers their games to be single, complete works of art. However, a video game is made of many separate artistic contributions, including: visual art (backgrounds, environments, 2d or 3d assets, etc.), characters (this one, studios are often aware of, as their characters help market the games), music, sound effects, possibly voice acting, and often a recorded story/narrative. There may be other non-copyrightable elements that may be able to be separately trademarked or patented, such as title and character logos and emblems and game mechanics/methodology.

By making both services and rightsholders aware of what data they already possess, we can build a more nuanced understanding of how to analyze and process trends within that data. We can also provide valuable insights into what is missing and where further data may be available. There’s no small reward at the end of this process, either – an enormous pool of unclaimed or misrouted funds, estimated by some sources at around 2.5 billion dollars annually, can be unlocked by more deeply reading and processing the available data.

Jesse Buddington is the Director of Licensing Operations at Loudr, overseeing all mechanical rights clearance and license administration processes. He was originally trained as an opera singer and honed his knowledge of mechanical licensing and music publishing by asking lots of questions, reading ravenously, and working with practical cases within the industry. Despite his parents' hopes, he has no desire to become a lawyer.