Are you my Elvis?

Rachel Lovinger   September 9, 2009

Some president guy meets a singer dude. (image via Chronophobic)

The breakdown: How does a major announcement by the New York Times to make a massive digital index available to the public change the landscape for reliable content topics and metadata? Rachel Lovinger explores why Wikipedia shouldn’t be our one stop shop when it comes to significant events.

A few months ago the New York Times announced their intention to make their entire index available, in a structured digital format. The Index was first published in bound volumes in 1913 and has grown to include over 500,000 terms that have been used to tag articles going all the way back to 1851. That’s 500,000 significant people, places, things, organizations, and concepts. To be clear, the Index includes the tagged terms, not the articles themselves.

Ok, so that’s a big list of words, but why does it matter? As we move towards a more data-driven digital world, there’s a strong need for online services to have a reliable, accurate, common frame of reference that covers all the major topics, people & things of interest. Let’s say you’re a big fan of the movie Up and you want to subscribe to a service that pulls in any news, media, and conversations about the animated movie. In order to be sure that content is related to the film, and not all the many other uses of the word “up,” automated services will need to use some kind of unique identifier. This can be an alphanumeric code (like an AMG ID, licensed from All Media Guide) or a URL (like, but it has to be something that the service and the content providers can both share.

Many experimental projects have tried using Wikipedia as this kind of database of knowledge. In some ways, this makes sense. If you strip out the content of the pages, you’re left with a taxonomy of nearly 3 million page names. This list of terms is well-structured, because of Wikipedia’s use of links and categories, and it covers a huge body of human knowledge.

But one could argue that Wikipedia has an unhealthy emphasis on pop culture and internet memes. How valuable are those 3 million page names when they include a huge number of topics like The Hampster Dance (an animation of rodents dancing), Chrismukkah (a blending of Christmas and Hanukkah, popularized by a TV show called The O.C), Brfxxccxxmnpcccclllmmnprxvclmnckssqlbb11116 (a name given to a Swedish child born in 1991), More cowbell (a popular phrase from a Saturday Night Live sketch starring Christopher Walken) and nearly 500 pages devoted to the creatures of Pokémon (a media franchise about battling monsters)? Suppose you mention Elvis, does Wikipedia know if you mean Elvis Presley, Élvis Alves Pereira, the TV miniseries, the album, the film, the TV special, the text editor, the comic strip, the character in the movie Cars, the pinball machine, the helicopter, or the other album?

The New York Times Index would offer the Web of Data another option for a structured, digital, open representation of human knowledge. One that comes from a trusted brand that’s known for its depth and breadth of coverage. Coverage that’s been researched and fact-checked by professionals.


3 Responses

  1. […] Are you my Elvis? on the NYT digital index, by Rachel Lovinger […]

  2. Erin Scime says:

    Thanks Rachel for posting on this subject! I think for the Times, it may be possible to think about a new life for the brand and Times as product itself – instead of publishing fresh and immediate content, the online edition becoming an evergreen source for archival information makes it a museum of sorts — changing it’s core product mission from accurate reporting to accurate and responsible maintenance of historic information. In a way, this move may be a public/commercial version of great information services like EBSCO, HW Wilson or ProQuest. Money issues aside, with semantic/interoperable metadata schemas, maybe the future of sharing data between providers is brighter than we think? I hope so!

  3. […] via Are you my Elvis? – Scatter/Gather: a Razorfish blog about content strategy, pop culture and h…. […]

Leave a Reply

Razorfish Blogs


  • SXSW Interactive

    March 7 – 11, Austin, TX
    Several of our contributors will be speaking this year. If you’re going, say hi to Rachel, Robert, & Hawk.

  • Confab Minneapolis

    May 7-9, Minneapolis, MN
    The original Confab Event. Rachel will be there doing her Content Modelling workshop with Cleve Gibbon. Get details and we’ll see you there!

  • Intelligent Content Conference Life Sciences & Healthcare

    May 8-9, San Francisco, CA
    Call for Presenters, now open:

  • Confab for Nonprofits

    Jun 16, Chicago, IL
    Another new Confab Event! Early Bird pricing until March 7:

  • Content Strategy Forum

    July 1-3, Frankfurt, Germany
    International Content Strategy workshops & conference: Call for speakers now open!

Search scatter/gather

What is this site, exactly?

Scatter/Gather is a blog about the intersection of content strategy, pop culture and human behavior. Contributors are all practicing Content Strategists at the offices of Razorfish, an international digital design agency.

This blog reflects the views of the individual contributors and not necessarily the views of Razorfish.

What is content strategy?

Oooh, the elevator pitch. Here we go: There is content on the web. You love it. Or you do not love it. Either way, it is out there, and it is growing. Content strategy encompasses the discovery, ideation, implementation and maintenance of all types of digital content—links, tags, metadata, video, whatever. Ultimately, we work closely with information architects and creative types to craft delicious, usable web experiences for our clients.

Why "scatter/gather"?

It’s an iterative data clustering operation that’s designed to enable rich browsing capabilities. “Data clustering” seems rather awesome and relevant to our quest, plus we thought the phrase just sounded really cool.

Privacy Policy | Entries (RSS) |     © Razorfish™ LLC All rights reserved. Company Logo.