What Will You Love?
Michael Barnwell March 13, 2009Netflix origami: extending the experience beyond the flick. (via)
Being able to predict human behavior is a real talent that deserves praise and rich rewards. Netflix agrees and since 2006 has been holding an ongoing competition to improve the accuracy of its movie recommendations to members, handing over one million dollars to the team who can deliver a 10% increase in the accuracy of its “world-class movie recommendation system” Cinematch. This, of course, will require a complex algorithm and an equally complex judging standard involving something called the RSME (root mean squared error) of a data set. Put simply, competitors are vying to predict how likely Customer Sarah is to give “Sleepless in Seattle” a 5-star rating.
Setting aside the quibble about the notion of an increase in accuracy (nearly accurate isn’t actually accurate is it?), competitors are closing in on the prize. As of the end of January, team “BellKor in BigChaos” is within reach of the seven-figure prize, posting a 9.63% improvement. To give Netflix credit, Cinematch isn’t bad. After I submitted ratings on a couple hundred movies, including a 5-star rating on The Coen brother’s movie “Fargo,” it told me that I’d also love their movies “Barton Fink” and “The Big Lebowski.” Amazing! It also told me I’d love several dozen other movies, some of which I certainly would (who wouldn’t love “Duck Soup” and “Vertigo”), others I would certainly not, in diminishing degrees. Netflix is confident enough about its predictive abilities that it offers a site area labeled “Movies You’ll Love.” Come-hither marketing language aside, I’m confident enough about my own predictive abilities that it would be 100% more accurate to label it “Movies You Might Love.”
I’m all for a service that could weed out the junk from a storehouse of 100,000 titles, but I wonder if so much effort and a complex RSME calculation is really needed. Wouldn’t it be simpler to rely on the more transparent workhorse trinity of taxonomy, tracking, and tagging? In short, if absolute accuracy is an impossible dream and being mostly on target is good enough, wouldn’t it make sense to just opt for a lo-fi method? This would require just marshalling the metadata (some are inherent and filter-like, such as title, director, actors), tracking the traffic (what does my history say that I’ve rented with the same director or actors?), and by all means incorporating the tagging, mine and yours (not only did I rent it, I liked it, and several thousand other people liked it, too). All in all, it would be a recommendation with a bit of the machine and a bit of the all-important human. This would quickly get me a reduced set of choices. I can take it from there. By going the lo-fi route, Netflix could redirect future prize money (not to disparage the brilliance of the competing teams) toward increasing the number of movies it allows users to stream, which, by the way, greatly reduces the risk and aggravation of making the wrong choice of movie upfront. Don’t like it? Stream another.
Netflix isn’t the only one trying to hone its ability to predict human behavior. To mention just a couple, last.fm has its own recommendation engine and so does iTunes. Like NetFlix, they’re both pretty good. At the moment, most of the public attention to recommendation engines is going to entertainment, but I’m wondering what Netflix’s ambitions have to say to the corporate model of document retrieval. If I’m faced with finding the gem in a pile of research and development documents, how can I find the equivalent of that perfect dark comedy with heaps of bloody carnage and wit, made by a Coen-esque brothers team and starring John Goodman? By limiting the number of relevant documents by a factor of ten or even a hundred through taxonomy, tracking, and tagging, I’m very likely to get that small set of worthy choices from which I can pluck the prizewinner.

I’m sure the Coen brothers and John Goodman would completely agree with you. I fail to see how tracking what people rent is better than tracking how much they like what they rent. And I suspect that the algorithms are somewhat more complex than RMSE, which really isn’t complicated at all. At any rate, it strikes me that tracking ratings is much better way to introduce one to new things rather than relying commonalities in the metadata.