The Interesting Stuff

There are a ton of data sources out there that can easily be pulled in and learned from.  Sources like Twitter, LinkedIn, Facebook, and even enterprise data sources like internal CRM, bug tracking systems, customer suppoert, and communication systems.  It's all very possible today and being used for some very interesting things.

That is the low-hanging fruit at this point.  Pull in a data source or two and use the data in them to enrich what you know about your customer/visitor/market/etc.  It is easy to learn what Twitter knows about a person, a $15/hr coder from a former Soviet bloc country can easily get that for you.  

But if you've ever tried this you quickly realized that the number of members that Twitter advertises is nothing like the number that actually participates.  You are really only able to learn about the tiny fraction of users that actively participate.  No, it's finding out about the people that Twitter doesn't know about that's the real trick.  This plays out in any data source that you're hoping will be as comprehensive as possible.

It seems to break down that you can usually easily find out 50% of what you need to know from easily accessible data sources.  The other 50% is REALLY hard to get to.

Screen shot 2011-02-28 at 5.58.40 PM

The real magic is in knowing something that is really hard to find out right now.  Pulling in new data sources, more difficult and new ones, and combining it with the data that everyone already knows about to fill in the missing 50%.  The land of screen scrapers, Mechanical Turk, maddeningly complex ETL processes, etc.  That's the really interesting stuff.

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • HackerNews
  • Reddit