This is kind of an obscure topic, but I’ve been trying to figure out what the best way to count things is, and how to best identify boundaries and relationships between pools of data which may or may not be the same thing.
That is,
given a bunch of data about something, the boundaries between one thing and another are pretty fuzzy. They might be different things, or they might actually be the same thing that I know different sets of things about. This is essentially the crux of data integration, but approaching it from a slightly different angle.
This might sound kind of abstract and pointless, but I’m starting to see that it’s not. And the reason for that is relationships. Creating relationships between two different entitie–saying that they are the “same as” each other for instance–is not the same as having two different places in space where data can collect and accumulate and saying that those two places are actually the same (what I’m calling a “data wormhole”). In one model the entities are connected, in the other two areas in n-dimensional data is connected via a wormhole, and data is transparently in both places at once. It’s like a graph database without entities. It’s useful because you can just store bits of data and they are returned as one or more entities depending on the precision you’re looking for or how similar you want them to be, and the context is built up around it automatically.
The problem is that this totally destroys the relational database paradigms because all of a sudden there are no keys at all, only n-dimensional coordinates. This in turn destroys the traditional concept of a query so you’re required to completely reinvent that as well–instead of saying “I want all entities related to X” you’re saying “I want everything within N similarity of Y”, which almost hurts my brain. You also have to index everything, which is not feasible in a traditional relational environment.
I’m pretty convinced the power this approach brings to the table outweighs the time it takes to rebuild the concept of a query though, and I can’t think of any other way to accomplish it.








