I have long been a fan of Peter Deutsch’s fallacies (btw I’m not alone, Google this AM produced over 22k references) of network/distributed computing, they have served as a set of guiding checkpoints for every distributed system that I have built. What I have found to be missing, however, is a similar set of fallacies/truisms for managing Information while we approach “internet scale” information infrastructure… the information explosion.

Truisms defined by/principles for managing information explosion:

  1. no one person/system is capable of managing all data
  2. optimizations will be continually applied, but by different vendors, thereby requiring an enterprise to distribute their information architecture
  3. information processing is inherently a pipelined process (though fork/join supports parallelism for reduction of latency)
  4. these pipeline’s can have “in parallel” replicas so long as sufficient locking is engineered, and compensation models supported
  5. locking for a given pipeline should be owned discretely by a single application context (workflow) - though this workflow may be complex, it is stateless upon completion of end state
  6. loose coupling / jit integration require coherent, federable, data dictionaries and meta-data/structure maps

Translated to Fallacies… which, agreeing with SGG, I think are way more powerful, and in some cases hilarious.

  1. there is one enterprise data architect who is responsible for the master models
  2. there is a system who is the authoritative master for a given entity domain
  3. there is one vendor involved across the SOA and EIM domain
  4. the data models are largely fixed, and the business will not ask for further changes/enhancements to the model
  5. data exchange will be based upon XA/2-phase transactional mechanisms to achieve ACID properties (pessimistic transactionality)
  6. there will be a singular data dictionary, with complete meta-data for a given entity domain

Additions/Subtractions/debate most wanted!

Technorati Tags: , ,

Post tags:

3 Comments »

  1. Dan:
    Interesting post, some comments on the principles:

    >1. no one person/system is capable of managing all data
    Why does the data need to be managed? (ie what problem is being solved by managing the data). Data management (and governance) is multilayered. Is the management for purposes of cost containment of hardware/software? Is the management for purposes of information security? Is the management for purposes of regulatory compliance? etc.

    >2. optimizations will be continually improved, but by different >vendors, thereby requiring an enterprise to distribute their >information architecture
    Although multi-vendor innovation is inevitable, it strikes me that in order for a consumer to receive the benefits, there needs to be some sort of “standarization” in order that the innovations from different vendors will NOT be siloed. Without standardization, I don’t see how innovations from different vendors can at all be synergistic.

    >3. information processing is inherently a pipelined process (though >fork/join supports parallelism for reduction of latency)
    Although I don’t disagree, why is this sufficiently important so as to be listed as a “principle”? ie what are the implications of this point?

    >4. these pipeline’s can have “in parallel” replicas so long as >sufficient locking is engineered, and compensation models supported
    It seems that 4 is really 3.a). Otherwise, my comment to 3. applies here as well (ie what are the implications)?

    >5. locking for a given pipeline should be owned discretely by a >single application context (workflow) - though this workflow may be >complex, it is stateless upon completion of end state
    So this looks like 3.b) (ie not a separate top level principle). Is your point that two-phase commit and other systems that try to provide atomic level guarantees is not appropriate?

    >6. loose coupling / jit integration require coherent, federable, >data dictionaries and meta-data/structure maps
    I think you need to establish what “loose coupling” has to do with information infrastructure (I think it is important, my point is you haven’t motivated it in this post). Further, if you mean by coherent, a more general concept of reconcilable (ie there is some means by which information elements in different information feeds can be “joined” or “correlated” etc.), then I agree. It seems to me that key to any information infrastructure is the notion of information entity identity correlation. Without this, there is no meaningful ways to build higher order information feeds by combining (mashups) other information feeds.

    sgg $0.02 (CDN)

    Comment by sgg — September 9, 2008 @ 3:33 pm

  2. Really like the idea of “transforming” Deutsch’s fallacies into the information management space. I would change them to fallacies and add:
    The data is correct.
    Data owners will relinquish control.
    There is an uber-model.
    All interested parties will want correct data.
    ACID will save us
    The linkages between information is well known
    The linkages between information is static

    more comments later….

    Comment by Tom Maguire — September 9, 2008 @ 3:34 pm

  3. ???? ????? ?????? ????????…

    ???????? ???????’ ??????????? What I have found to be missing, however, is a similar set of […….

    Trackback by Kylie Batt — April 20, 2010 @ 7:26 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment