Reinventing the Internet, part three: Unlocking the potential of the URI

This is part three in my series of blog posts exploring what I believe to be the future of our online identity and experience.  In part one I laid out why I believe the future is in an open peer to peer social network, in part two I described how and why that network needs to be based first and foremost on domain names owned by the individuals that make up the network.

At this point, you’re probably thinking, big deal.  So people could own their own domain names for free, but what difference does that really make?  You can let people route their online communication with other people through their domain names by pointing it at their MySpace page, receiving email thru it, and maybe basing their instant messaging name on it.  Cool, but only geeks will "get it" at this point, and does it really provide enough value to make the average person care?  No, probably not.  Yet.

But, when you think about the fact that owning your own domain name gives you the ability to create a whole slew of unique URI’s that are all your own, things start getting really interesting.  I didn’t realize this until recently, so let me explain.  Hopefully you’ll have the same “Aha” moment that I had.

I was born and bred in the relational database, web-based application world.  I cut my programming teeth on Microsoft technology like ASP, ASP.NET, and SQL Server.  So I grew up thinking of a URL as a way to get to a page or application on the Web, or maybe somewhere on an internal network.  To me, a URI = DNS work, and as a programmer it was something the network guys took care of.  Later in my programming career I thought of a URI as a way into a REST API.  But for the most part, it was one of those things I took for granted, an ingredient in my standard technology stew.  It got me where I needed to go, but I now realize that I wasn’t seeing the forest for the trees.

It wasn’t until just recently that I started re-looking at the URI.  I’ve owned my own domain name for a while now, and use it for my personal email address ([email protected]) and, more recently, for my blog.  I thought it was cool that I controlled the content that lived at *my* domain name, and that I permanently owned it, but it didn’t really hold any value to me beyond that.  However, recently I began combing thru the XMPP specs that make up the core of Jabber, and something dawned on me:  a URI can be used to get to ANYTHING.  And ANYTHING is the key word here, because I’m not just talking about a blog, or email, or an instant messaging account.  I’m talking about ANYTHING.

For those who, like me, never thought twice about a URI before, here’s a quick primer.  A URI is made up of several parts:

Uri_diagram

There are a few other more obscure pieces as well, and I’m not so much interested in the user info portion of the URI, that’s really mostly applicable to email and instant messaging as far as I can tell.  (Which are important, don’t get me wrong, just not pertinent to what I’m talking about in this post.)  But as you can see, aside from the domain name itself, you can have resources underneath it.  I had always thought in terms of directories, pages, and REST and Web Service endpoints, but not in terms of resources and living, breathing objects at the other end.  And what I really missed is the fact that you have complete freedom to determine what that resource is.  Yes, a page or a subdirectory is a resource, and a REST service is a resource, but there’s no reason to stop there.  In fact, if you don’t stop there, a whole new world of possibilities open up when you look at it from the standpoint of a global decentralized database.

I grew up in the world of relational databases.  In fact, I’m the CTO of a company called Latigent whose main product is a relational database reporting tool called BlueVue.  The exposure I’ve gotten from being immersed in relational databases has taught me a few things about data, primarily that the end goal of any data warehousing project is to come up with a normalized schema, a single version of the truth.  (Regardless of how difficult that is, believe me I am fully aware of how hard that is to achieve.  That goal has cost me many late nights.)

The light bulb really went on for me when I was thinking about URI’s and the fact that in a relational database of any type, you need a way to address "things" (or "resources") within the database.  In a standard database model this is typically done using one of two things:

  1. An integer.  This is usually where programmers start their careers, they have a table and create a primary key that’s just a sequential numeric column, something like this:Numeric_primary_key
  2. A GUID.  Once programmers reach a certain point, they realize that sequential integers are useful, but they’re also very limiting.  There’s no way to uniquely identify a row in the database because the ID’s in different tables overlap, often.  It also makes replication damn near impossible.  So we move on to using GUID’s, which are globally unique and look something like this:  2fce8470-df88-4f0e-a642-f51b15e49c7e.   So our tables start looking like this:  Guid_primary_key

Now, GUID’s are great, I’ve been using them forever and they work well. However, why are we using a nonsensical series of letters and numbers to identify something?  That doesn’t make a whole lot of sense to me; there should be a way to uniquely identify something and make it possible to relay that address to another human without copying and pasting it into an email.  Can’t we also make it more useful than just identifying a resource?

About this point is where I had my “Aha” moment.  I realized that there is another option for globally unique ID’s, that’s human readable, and that already has functionality baked in: namely, it allows you to locate what you’re looking at, on the Internet.  What I’m referring to, of course, is the URI.  With a URI, you have the ability to uniquely identify something over all of the Internet and, this is key, actually GET TO IT and DO SOMETHING with it.

Think about that for a second.  When it’s put into place, what that actually does is turn the entire Internet into one giant relational database.  Your tables start looking like this:  Uri_primary_key

And with the proper server, those URI’s actually start coming alive and offering functionality of their own.  Each object, or row within the database, gains actual capabilities, outside of whatever application sits on top of the database.

The more I thought about this the more it made a whole lot of sense to me.  When you start using URI’s as the ID for everything in a database, you get a whole lot more functionality than the standard database, for free, and that functionality is directly applicable to the Internet experience.  For example, do you know how hard it is to build a distributed database that works?  You have to set up clusters and partitions, and figure out the best way to distribute and paritition the data.  In this scheme, all you have to do is distribute the objects over a bunch of servers at different URI’s.  When you really think about it, the application isn’t going to query a local database for something that lives at another domain, it can go directly to the source and by doing so natively gain distributed capabilities.

Another very cool side-effect of using URI’s as ID’s is that data authentication as a problem completely goes away.  If you’re consuming data that lives at a particular URI, you have complete confidence that what you’re looking at is coming directly from the source (unless your DNS server is corrupted somehow).  When I look at the data at www.jasonkolb.com/weblog, for example, I know for sure that the data I receive is coming from me, because that’s where I got it from.  As cool and ingenious as technology like OpenID is, it’s really a band-aid of sorts to fix the fact that people’s data doesn’t currently live at their own domain.  When everyone owns their own domain (the how of which I posted about in part two), the problem just goes away.

So, I was pretty surprised at how much potential lies behind the simple URI that I’ve taken for granted for so many years.  But, a distributed database on top of the Internet is a very cool sounding idea, but until there’s some real meat behind it, it’s still just a pie-in-the-sky idea.  Well, I’m currently working on that, and the playing I’ve done so far seems to bear this out pretty well. However, I’ve put that on the back-burner for now until I can finish building out the first step, which is the www.atmy.name site that I talked in the previous post in this series.  Now that I’ve laid out the first part of what I want to do, I’m going to put this series on hold until that piece is done and working.  Once that’s done I’ll write part four, which will describe the software that needs to live at everyone’s personal domain, what it can do, and how it’s going to get there.  (Hint:  it’s 100% open source and owned by everyone, so there will be no opportunity for anything like this or this to happen on this network.)

Here’s a link to Reinventing the Internet, part four:  Connecting the dots

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Google Bookmarks
  • HackerNews
  • Reddit
  • http://www.jasonkolb.com/weblog/2006/08/reinventing_the_1.html JasonKolb.com

    Reinventing the Internet, part two: A domain name in every pot

    In my last post, part one of my little experiment, I talked about how the next evolutionary leap in social networks has to be an open peer to peer network. Obviously this implies that the people somehow own the endpoints

  • http://www.i-together.net/weaverluke/weblog.html weaverluke

    Ok, so let's say we've collectively identified everyone and everything with URIs.

    Now how the heck are we going to *discover* this stuff, given that we will only know our own naming scheme?!

  • http://profile.typekey.com/jasonkolb/ jasonkolb

    This is really what I'm planning to post about next time, but the solution really lies in the combination of the Jabber specs at http://www.jabber.org and a customized XMPP server.

  • http://www.i-together.net/weaverluke/weblog.html weaverluke

    Ok, thanks—looking foward to your follow-up post!

  • http://blog.pdtoal.com/2006/08/30/using-your-online-identity/ Identity, Security

    Using your Online Identity

  • http://profile.typekey.com/Ed_Dodds/ Ed_Dodds

    Now consider the possibilities of using this in concert with opengroup's universal data element framework; or how URIs might extend the UDEF…

  • http://www.base4.net/blog.aspx Alex James

    Interesting stuff. I have been doing a whole series of posts on Data2.0 which is similar but not quite the same. I like your primary keys idea.

    http://www.base4.net/blog.aspx?ID=76 is particularly relevant I think.

  • http://profile.typekey.com/jasonkolb/ jasonkolb

    Great series of posts Alex, and a very interesting blog. It's also great to see that others are starting to arrive at the same conclusions from other directions. I haven't quite been able to catch up on all your Data 2.0 posts, might be a weekend project ;)

  • http://www.jasonkolb.com/weblog/2006/07/this_is_not_an_.html JasonKolb.com

    Featured Posts

    Reinventing the Internet, part one – How the evolution of social networks is going to fundamentally change the Internet and the way we use it to communicate. Reinventing the Internet, part two: A domain name in every pot – Why

  • http://dannyayers.com Danny

    If you go the next step, that URIs can also be used to identify relations between things then you've pretty much got the Semantic Web idea.

    (Various detail tweaks, notatbly the basic relations (RDF properties) are normalised right down to 2-part, subject-object; the open world assumption is taken, anything that isn't true is unknown – an analogy to the 404).

  • Peter Ring

    Using URIs in the form of URLs as identifiers introduce a number of challenges (that you you didn't have with GUIDs).

    Here is a discussion of some of the issues:
    http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm
    http://www.w3.org/2002/11/dbooth-names/dbooth-rfc2396-analysis_clean.htm

    The W3C TAG formally resolved httpRange-14 in a pragmatic way that doesn't really help much.

    If you want to use URIs as general identifiers, there is a number of schemes better suited for the purpose, notably:

    TagURI
    http://www.taguri.org/

    XRI
    http://www.xdi.org/

    There is still the messy issue of namespaces and using QNames in element content and attribute values as identifiers. There is a TAG finding on the subject:

    http://www.w3.org/2001/tag/doc/qnameids.html

    Lots of annoying problems arise because:
    - the mechanism for mapping {URI,localname} pairs to QNames is application-dependent
    - shorthand pointers (the most useful fragment identifiers) for application/xml and text/xml must be NCNames, i.e. cannot start with a digit

    You can see a bit of the current discussion here:

    http://www.w3.org/2001/tag/2006/06/14-minutes.html#item02

    Kind regards
    Peter Ring

  • http://www.jasonkolb.com/weblog/2006/09/reinventing_the.html JasonKolb.com

    Reinventing the Internet, part five: Decentralized network, centralized identity

    This is the fifth post in my series about what I believe to be the future of the Internet. After a nice laid-back labor day weekend off the comments and emails have piled up, thanks to everyone who took the

  • http://www.jasonkolb.com/weblog/2006/08/reinventing_the_3.html JasonKolb.com

    Reinventing the Internet, part four: Connecting the dots

    This is the fourth post in my series about what I believe is the future of our online experience and identity. In part one I talked about why I believe the future is in an open peer to peer social

  • http://profile.typekey.com/Logomachist/ Rob

    I'm with Danny. You're on your way to (re)inventing the semantic web.

    But you run into some problems if expecting everyone to own their own domains and expect the domains to be at the same time humanly readable. Eventually you run out of good domain names, and and people have to start reusing hosts and other domains. This isn't a big issue because people have seemed pretty happy having their profiles/blogs hosted on a few commonly known domains. Myspace, livejournal, ect…

    But realize that as soon as you start depending on 3rd parties (myspace.com, eu.org, atmy.name, whatever) you don't really have complete control over the domain. The company can go bankrupt, start charging fees, ect…

    URNs and (as Danny suggested) TagURIs don't have this problem.

    Also so realize that "human readable" only means something if the human involved knows the language the domain is written in.

    These aren't big problems for the semantic web, because on the SW an IRI doesn't have to mean anything, or even represent an obtainable resource.

    But dependance on 3rd parties is enough to prevent you from using URLs as any sort of serious authentication, the way you suggest. OpenID and Typekey (if I understand them correctly) actually work pretty well for informal usage on blogs, but no one in their right mind would ever want to use them for anything *serious*. There are large, complicated math-based and standards-body endorsed specifications for that sort of thing. And frankly, while the informal URL-based stuff I mentioned above works for now because it's simple and quick, over time people will shift to the complicated Liberty-Allience Palladium-like stuff provided by big companies b/c it's more secure, once the companies simplify the process. They've fought about standards and consequently dragged their for awhile now, but IMHO now that there is competition from relative lightweights like Six Apart I predict the big companies (Yahoo, MS, Sun, probably Google, maybe even eBay w/ Paypal) will quickly ship usable products.

    Danny, could you explain some of your links? There was a lot of content on the linked sites… more than I could absorb. Especially the thing about XRI and httpRange-14.

    The W3 links about understanding what IRIs represent, I think, are moot b/c that can be clarified with metadata (on the SW).

  • http://www.problemcocuk.com savas

    We don't yet accept OpenID identities within our products as a relying party, but we're actively working on it. That roll-out is likely to be gradual.