Analytics and Outliers

In any given set of data there is:
  • The norm
  • And there are outliers

Most reporting and analytics tools are excellent at telling you the norm, but don't help you find outliers, or give you extremely crude tools for using metrics to identify crisis situations.

The outliers are the most interesting part.  T99% of the opportunity is in outliers:

  • Above-average customers are outliers
  • Underpriced assets are outliers
  • Influencers are outliers
  • Hits (music/games/movies/whatever) are outliers

Recognizing outliers is the most important part.  If you have the ability to skim the cream off the top, where you get 80% of the bang for 20% of the buck, why wouldn't you?

The interesting part lies in the fact that the more data you have to work with–the more you know about everyone–the easier it gets to recognize outliers.  The patterns that can be mined from the data improves, improving your ability to spot those outliers in the crowd.

This is why I am excited about both teaming up Big Data (Cassandra being my store of choice) and data mining.  Build a mineable data warehouse in anticipation of unknown influences and links between data, and build it in such a way that it can intelligently link them, and that's money in the bank.  In any industry.

The reason I've been so quiet lately is because I've been doing some hard-core immersion in this stuff, and have been in straight-up learning mode.  But it's really cool stuff, the potential is very exciting.

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • HackerNews
  • Reddit