One of the reasons I love data is because there’s so much potential for mining real value from it, especially when you combine it with other, new data sources. In fact it acts a lot like a traditional commodity such as copper or wool in that someone produces it, and then someone else buys the raw material and makes something new from it. It’s unique from traditional commodities, however, in that it doesn’t get used up at all when it’s used to create something new–this makes it particularly interesting from an economic point of view.
In addition, anyone can make it, it doesn’t get used up, and the industry of using data to create new and valuable things is still so young and ripe for profit-making. In fact I think it’s one of the areas that America needs to focus on if its economy is to recover because for the most part it’s still virgin territory and it’s going to create a lot of economic value. What I really don’t want to see is foreign companies being the first to capitalize on the data as that would suck most of the value out of our economy, just what we don’t need right now.
Data as a commodity is also interesting to data producers, which is potentially just about every business with an IT department. It is a new revenue stream that most businesses don’t even realize they should be considering. Any business that keeps a data warehouse needs to start thinking about if and how it wants to monetize that data. Topics like data anonymization and regulatory restraints need to thought out well in advance (meaning, right now) so that the wheels can get moving. I can tell you that from my experience next to nobody is taking advantage of their data, seeing it as a the backend behind their reports at best and a storage expense worst, instead of a potential revenue center.
In order to monetize data, however, or to mine it for value, you need some type of exchange where buyers and sellers can exchange it. One of the trends I’ve been keeping an eye on is the development of these marketplaces, and there are now several marketplaces offering data as a paid commodity, most coming online within the past year:
Microsoft Azure DataMarket prices its data by transactions per month. The main advantage it has right now is a normalized OData schema which provides a baked-in integration layer which is a huge value-add for developers. However, it doesn’t seem to have a very mature API for content publishers to put data into the marketplace, which is unfortunate. InfoChimps seems to have the most interesting data sets at this point (depending on your use, obviously) like data mined from the Twitter firehose such as trust and authority ranking, and some really oddball stuff like UFO sighting reports. However it seems to require a different API for each data set which doesn’t allow you to easily integrate multiple data sets. Factual has a nice API which allows developers to correct data (although I’m not sure if the corrections are shared across developers). However, I wasn’t able to find any API at all for data providers to put data into the marketplace, and again there is no common schema for data meaning that all integration is pushed out to the consumer.
None of these API’s appear to support streaming for real-time applications, which is unfortunate, but I’m hoping that changes as the space matures. The data publisher side really needs some love. Aside from the Microsoft offering they don’t seem to provide any type of help with integration either, which is definitely going to make it harder on consumers of the data as it puts the onus of correctly integrating the data sets on them.
I haven’t been able to find any really good examples of game-changing data in these marketplaces yet–it mostly seems to be cleansed/normalized/mined versions of large public data sets, which is unfortunate (some of the Twitter data sets on InfoChimps being the exception as far as I can tell). It’s going to be interesting to watch this space to see who ends up with the most data sources, the most data publishers, the easiest data integration, and thus the biggest competitive advantage. It would behoove each of these guys to focus on that, I believe. The hardest problem there is going to be lowering the barrier to participation for all types of businesses and create a streamlined process for adding and vetting data sources.
I believe this is going to be a huge, huge, opportunity and that data marketplaces–especially if they offer out-of-the-box integration–are going to make money hand over fist. It’s also going to be an area ripe for mergers and acquisitions. It’s still so early here that it’s like the Wild West, we’ll see if we get the gold rush again.