The subject is metadata.
Metadata is the information that is supposed to accompany each eBook so that things such as — but not limited to — arranging them by Writer or Genre or Publisher or Date Published is possible.
If you’re going on vacation, for example, and want to take along a mystery, how could you quickly find one in an eBook library comprised of several hundred editions?
Such sorting is why we have computers. They do the grunt work.
But they can only do it based on data. When the data — the metadata — isn’t there, all hell breaks loose and life is rotten for everybody.
The world runs on metadata. This financial mess we’re in? It’s all metadata — information abstracted from its original source. Metadata is derivative, abstracted data.
So if you think metadata is some little thing the world can do without, you are wrong.
Metadata is one of the little details that, ages ago, Steve Jobs would have cared about. His attention to detail in the screens of the original circa-1984 Macintosh is the stuff of legends. He would criticize down to the individual pixel. Well, Jobs, metadata is akin to those pixels you once studied.
In my previous post, about iTunes ePub eBook display options, I encountered bizarre arrangements of the eBooks. Something was plain wrong there and I wanted to find out what it was.
Some of it is not Apple’s fault, but parts of it is Apple’s fault.
This post’s purpose is to wake up every book publisher — major, minor, and single-writer, every book distributor — Smashwords, Feedbooks, et al, and Apple itself.
This is all wrong:
When sorted by Categories, it shouldn’t look like that at all.
Again, with a count of books for each Category. Except that not all are Categories. A Christmas Carol is a Category? And why is the Gutenberg Holmes not under either Novels or Action & Adventure? And why is the 23rd Century book in its own Category and not under Science-Fiction? Murder Piping Hot, from Smashwords, should be under a Mystery category — so where did haggis come from? Also notice how many are thrown into Unknown Genre! What’s happening with all of this metadata?
So what was going on here?
I had to go look at the metadata. Fortunately, there is a free tool that enables that: Calibre.
(Note that I’ve had to redact book info not relevant to this post. I didn’t want to go through the hassle of having to re-add them to Calibre later.)
I added each one to Calibre’s Library and called up the metadata. That’s accomplished either via the menu or right-clicking on a title, as I have done here:
The metadata screen looks like this:
I’m not going to burden everyone with having to Click = Big for every snap, so I will crop them all to the part that’s important, as seen here in red:
And we’ll thus examine the metadata for the first book. Murder Piping Hot, which seemed to have two anomalies. The first being its author’s name coming up as firstname-lastname, while the book next to it was lastname-firstname. And then it appearing under the Category of “haggis.”
Is any of that metadata correct and if not, why not?
As it turns out, the Author field is correct: Ann Morven — or firstname-lastname.
But where did the “haggis” come from? From the Tags field, where it also appears with “mystery” and “robertburns.”
Some of you think you just had an AHA! moment, but save that for now.
Let’s look at the book next to it now, Skyrider:
We can see the Author field is incorrectly filled in. We never see books with lastname-firstname as the author!
Under Tags, we see there is “Action & Adventure” — which begins to make you begin to doubt that AHA! moment.
We will skip to The Hunting Party, which wound up under Unknown Genre. Why did it?
Right away, we can see there is nothing in the Tags field. That answers the question of Unknown Genre.
But look! There’s a glaring error in the Author Sort field! It should be lastname-firstname, not firstname-lastname. This file is from Feedbooks and was probably quickly put together for me to test earlier today, so we won’t hold this error against it right now. But this does illustrate how easy it is to make a mistake with such important data and how such a little thing can have a cascading effect that can louse up everyone’s day.
We’ll hop to Strange Future, which came up under the bizarre Category of “23rd Century.” Let’s look at its Tags field:
The field is cut off. Instead of scrolling back and forth, it’s better to select all of it and paste it in a text program. It then reveals:
future society, timetravel, future earth, 23rd century, satire, science fiction comedy, futuristic novel, future life, time travel
And those of you who earlier had the AHA! moment just think you saw its validation. No. Wait for it.
I’m not going to address those tags. I just wanted to show what they are.
Now onto that Christmas Carol book:
And let me extract what’s in its Tags field:
romance, caden leigh, a christmas carol, scrooge, fiction
Where is your AHA! now? (I’ll get to that later.)
Let’s hop to the Sherlock Holmes book:
And extract its Tags field metadata:
Private investigators — England — Fiction, Detective and mystery stories, English, Holmes, Sherlock (Fictitious character) — Fiction
And now everyone who thought they had that AHA! moment — their heads explode!
Because up to now, we suspected that what was happening with Tags was iTunes was doing an alphabetic sort first and then putting the first tag — or set — as the Category. But here it has bizarrely and illogically grabbed “Holmes, Sherlock (Fictitious character)” as the Category field! If it was doing an alphabetical sort, it would have grabbed “England” instead.
Now you can see why I say some of this is also Apple’s fault.
Something very screwy is happening there with iTunes.
And to confirm that something screwy is happening with iTunes, here’s one more book’s metadata:
See? If iTunes was simply doing an alphabetical sort and then grabbing the first word or possible set, it would have placed this under the Category of “Philosophy,” not “Philosophy, Theology.”
I don’t know what kind of algorithm Apple is using. What I suspect is they have a faulty database they match against. If a match fails, it then does an alpha sort of Tags and grabs — something.
But this isn’t good enough.
How many people are going to look for a Mystery book under haggis?!
Some of you will protest that what I’ve shown you are from publishers who won’t be in the iBookstore. Well guess what? Murder Piping Hot is a Smashwords books — and it’s going to be in the iBookstore. Under the frikkin Category of haggis, apparently!
If you think none of this matters, this is why you aren’t working for Apple. How do you think their Genius system will work for books? It will be based primarily on sales and matched to book Categories. It won’t recommend a Romance to someone who primarily buys Biographies — unless the underlying metadata is screwed up. And as we have just seen, in the case of Murder Piping Hot, it will be!
Mismatched Genius recommendations hurts writers and casts doubts on the entire Genius system (which isn’t exactly held in high esteem by people who use it for music — but let’s be a bit idealistic here, OK?).
Amazon winds up getting people to spend additional money with its recommendation system. Don’t you think Apple wants to do that too?
And what I’ve shown you is only the tip of the iceberg with metadata. Let me show you the metadata possibilities that also exist. These are the metadata fields from the ePub editing program SIGIL:
Note that in the following screensnaps I have redacted information that is private in nature:
And now your jaw will drop:
And that’s not even every possibility, either! It’s just a taste!
(Thanks to Moriah Jovan for those screensnaps.)
A book’s metadata is as important as the book itself now. Because none of us are going to be strolling through physical bookstores browsing shelves. We’re going to use virtual shelves, on screens. And when we’re using those, we really don’t want to browse — the desire to browse goes down in proportion to the number of possible items. Just ask yourself what your desire is to browse the 150,000-plus apps in the App Store and you’ll see the naked truth of that!
We won’t browse: we’ll search. We’ll want to find what we want, buy it, and start reading it.
But without the metadata to help us along, buying is not going to be a smooth process. Apple will lose money — and more importantly, writers will lose money.
Apple has the resources to do the right thing. Setting metadata standards, hiring metadata specialist editors. But does Apple have the will?
Well, Apple better find the will. It has Google breathing down its neck — and Google has stolen the entire history of books (see all the backlinks there). Google is going to want to make lots of money off that
investment theft — and Google understands the primacy of metadata for search.
Apple: Get to it. Steve Jobs: start caring about the little things again!
For more information, read Linda J. Dawson’s post: Metadata! More Important Than Ever!
Apple should be smart and hire her as a consultant. She’s knows metadata, period.
Apple needs to as well.