By Joe Pairman
Let’s say you’re going to town, and you ask your partner if they need anything from the shops. “Could you just pick up a bottle of shampoo” they say, and naturally, you say yes. Now you’re staring at a wall of bottles, each unique. What if you get the wrong one? Could it make your partner’s hair too dry? Change the color? MAKE IT FALL OUT? As you stare some more, the clashing colors on the bottles start vibrating.
You look away and get on the phone to your partner. “It’s in the 3rd aisle” — ok, that was the first problem, wrong aisle — “in a big bottle… not the single bottles, the multipacks”. Alright, that properly narrows things down. “It’s green.” There’s only one green shampoo, at least in a multipack. You grab it, pay, and come home victorious. Little do you know: its metadata that saved the day.
Your partner’s guidance was in the form of simple statements about the location, the pack, and the bottle color, that helped to describe and locate the shampoo. We can break each one into three parts (roughly corresponding to the grammatical subject, predicate, and object or complement):
In the field of metadata, structures such as these are also called statements. In the same way as the shampoo description, we typically use a series of related metadata statements to identify and locate a document, a page, or a chunk of content. Each statement can always be broken into these three parts: subject, predicate, and object.
Why can metadata seem so complicated, then? One reason is that the simple three-part structure is not always obvious. For a start, there are many ways to encode metadata. If standards-based semantic technology is being used, the structure should be fairly clear — in fact the basic component of all RDF is the “triple” — the three-part structure we have just seen. However, there are other models for metadata where you would need to look at standards documents or specifications to understand the structure. Also, different terms may be used for the three elements. A traditional and very precise way to describe them is as “entity”, “attribute”, and “attribute value”. Typically, the entity is the piece of content being described, and the attribute and value are the data about that content.
Another reason that metadata can be so intimidating is that it is used for many purposes. The National Information Standards Organization’s “Introduction to Metadata” defines three main categories (actually four, but the fourth, “markup languages”, is about the way metadata is applied to content, rather than the purposes it’s used for). Fortunately, our shampoo case provides an example of each category:
- Structural metadata. In publishing, perhaps a page or section number. In the shampoo example, knowing that it is in aisle 3 is a kind of structural metadata.
- Administrative metadata (technical subdivision). In digital content, a file format or syntax. For the shampoo, the fact that it comes in a multipack.
- Descriptive metadata. Any information describing the content itself. For the shampoo, perhaps the fact that it was green.
If you feel that you are falling down a rabbit hole of metadata uses, syntaxes, models, and terminology, it helps to break things down into that simple triple structure: subject, predicate, and object. Remember that without any technical manuals or semiotic theories, people speak metadata all the time — to identify things, find things, and to bring home just the right product from the shops.