I have recently created a secondary and more specialised blog called ‘Semantic Integration Therapy‘ given my focus is now beginning to shift to this particular discipline. Semantic Integration in my terminology and context relates to achieving more effective application integration and SOA solutions by extending the traditional integration contract (XSD and informal documentation) with a more sementically aware mechanism. As such I’m now deep-diving into the use of XMLSchema pluse Schematron, RDF, OWL and commercial tooling such as Progress DXSI. There is a heavy overlap with the Semantic Web Community, but my focus is moreso on the transactional integration space within the Enterprise, as opposed to the holistic principles of the third generation or semantic web.
I’ve been posting about the rise of the informal semantic contract relating to web-services and the deficiencies of XML Schema in adequately communicating the capability of anything other than a trivial service. Formalising a semantic contract by enriching a baseline structural contact (WSDL/XSD) with semantic or content-based constraints, effectively creates a smaller window of well-formedness, through which a consumer must navigate the well-formedness of their payload in issuing a request. Other factors such as incremental implementation of a complex business service ‘behind’ the generalised service interface compound the need for a semantic contract.
To clarify the relationship between structural and semantic, I happened upon a great picture which I’ve annotated…
This is my stake in the ground for now. SOA in the market-place places total emphasis on 2 things. Web Services as a basis for communication and Re-use as a basis for convincing the boss to put some cash into your middleware bunker..no…play-pen…err…seat of learning.
In addition the militant splinter-groups of the new-wave of RESTafarians (of whom I am an empathising skeptic on this specific 🙂 point about service specification) call for the death of WSDL and the reliance on (WSDL-lite) WADL in the case of the less extremist, but plain-old inference from sample instance documents in the case of the hard-core….
I am finding myself sailing down the no-mans land between these two polarised viewpoints, and see the need for specification in the more complex end of the interface spectrum, but similarly don’t see how specifications help when decoding the specification is harder than inferring from samples when interface contracts are relatively intuitive. So there we have a basic mental picture of my map of the universe.
Now I’m getting to the point.
I’m now convinced that SOA’s push for re-use established through WSDL everywhere, but equally the more recent RESTafarian voices relating to unspecification both have flaws when we are attempting to open up a generalised interface into a service endpoint capable of dealing with a range of entity variants (say product types for example).
My view here is that in the SOA landscape, the static and limited semantic capability of XMLSchema, and in the RESTian lanscape, the inability of humans to infer correctness without a large number of complex instance document snapshots, leads me to the conclusion that there is a vast, yawning, gaping, chasm of understanding in what constitutes an effective contract in locking down the permissible value permutations – aka the semantic contract.
I’ve seen MS-Word. I’ve seen MS-Excel. I’ve seen bleeding-eyes-on-5-hour-conference-calls-relating-to-who-means-what-when-we-say-customer, and best of all I’ve seen hardwired logic constructed in stove-pipes behind re-usable interfaces, aka lipstick on the pig.
I reckon the semantic contract – the contract locking down the permissible instances is far more important than the outer structural contract who’s value decays as the level of re-use and inherent complexity of the interface increases. In addition there are likely to be multiple iterations/increments of a semantic contract within the context of a structural contract as service functionality is incremented over successive iterations – adding product support incrementally to an ordering service of example. This leads to to the notion of the cable cross-section:
In the SOA context…WSDL drives tooling to abstract me from the structural contract. But the formation of the semantic contract as the expression of what the provider is willing to service via that re-usable and loose structural contract is the key to effective integration.
If we don’t pay this enough respect we’ll be using our system testing to mop-up the simple, avoidable instance-data related problems that could be easily avoided if we’d formalised the semantic contract earlier in the development lifecycle…
Powered by Qumana
Creating an approach to CI’ing large scale enterprise SOA initiatives has unearthed a potentially significant efficiency gain in the semantic layer. Semantics relate to instance data – and specifically in the context of re-usable, extensible service interfaces the semantic challenge eclipses that of achieving syntactical alignment between consumer and provider.
Evidence shows that the vast proportion of integration failures picked up in testing environments (having taken the hit to mobilise a complex deployment of a range of components) are related to data/semantics, not syntax.
As such I’ve been focusing on how to front-end the verification of a consumer ‘understanding’ the provider structurally and semantically from day 1 of the design process. The CI framework I’m putting together makes use of a traditional set of artifact presence/quality assessment, but significantly introduces the concept of the Semantic Mock (SMOCK) – which is an executable component based on the service contract with the addition of a set of evolving semantic expressions and constraints.
This SMOCKartifact allows a service provider to incrementally evolve the detail of the SMOCK whilst having the CI framework automatically acquiring consumer artifacts such as static instance docs or dynamic harnesses (both manifesting earlier in the delivery process than the final service implementation (and I mean on day 1 or 2 of a 90 day cycle as opposed to being identified through fall-out in formal test-environments or worse than that – in production environments for example).
Over time as both consumer and provider evolve through and beyond the SMOCK phase, the level of confidence in design integrity is exponentially improved – simply based on the fact that we’ve had continuous automated verification (and hence integration) of consumer and provider ‘contractal bindings’ for weeks or months. This ultimately leads to a more effective use of formal testing resource and time in adding value as opposed to fire-fighting and kicking back avoidable broken interfaces.
The tool I’m using to protoype ths SMOCK is Progress DXSI. This semantic integration capability occupies a significant niche by focusing on the semantic or data contract associated with all but the most trivial service interfaces. DXSI allows a provider domain-expert to enrich base artifacts (WSDL/XSD) and export runnable SMOCK components which can then be automatically acquired, hosted and exercised (by my CI environment) to verify consumer artifacts published by prospective consumers of the service. Best of all kicking back compliance reports based on the semantic constraints being exercised in each ‘test case’ such that my ‘CI Build Report’ includes a definition of why ‘your’ understanding of ‘my’ semantic contract is flawed…
Beyond SMOCK verification – DXSI also allows me to make a seamless transition into a production runtime too but that’s another story…
Powered by Qumana
So we aspire to loose-coupling, re-usable interfaces model abstraction as a means of implementing our SOA. Why? Well we’re told that the alternative is bad ! That alternative is unconstrained tactical wiring between applications, with the resulting unsustainable wiring being the essence of bad practice. I do agree to a point about the unconstrained integration being a bad thing, but there’s also some marketing greyness I need to dispel.
Point-to-Point or tactical integration is the term used to describe the creation of an application integration solution between 2 components, where the aspiration, the design, and the solution is only concerned with that specific requirement at that point in time. Shock horror – who would do such a thing?! Well there’s plenty of reasons for why this kind of approach may be suitable in some scenarios – in fact this IS the most popular approach to integration right!
However the subtle difference between the archetypal P2P interface and a reusable service is in how the design is approached – bear in mind P2P interactions still exist via re-usable services too. Has the interface been based on open standards in the infrastructure layer, has the interface been abstracted at a functional and information level to support additional dimensions (i.e. products, customer types etc) over time? Whether we use web-service technology or not we can still create re-usable services in the application integration landscape.
Now at the other end of the food chain we have our new-friend the coarse-grained, heavily abstracted, reusable Business Services driven out of the mainstream SOA approach to rewiring the Enterprise. Here we have, from the outside looking in, a single exposure for a complex array of related functions (i.e. multi dimensional product ordering), based on WSDL/SOAP/XSD/XML/WS* standards. This kind of approach is the current fashion, and is purported to simplify integration. Wrong!
What we do find is that the new, extensible interface simply creates a thin but strategic veil over the previous P2P interfaces, and effectively causes 2 areas of complex integration. Firstly – behind the new exposure, the service provider has to manage the mediation of an inbound request across his underlying domain models. Secondly the consumers of this newly published service have to deal with their own client-side mediation to enable their localised dialects to be transformed into a shape which can traverse the wire and be accepted by the remote service provider – or at least by the new strategic facade.
My point here is that SOA and the inherent style of wrapping functionality introduce integration challenges in their own right! So it’s not all rosy in the SOA garden, and this is where I’m seeing opportunity for a hybrid approach….and (appologies for the heresy, I’ll burn in hell if I’m wrong) a resurgence of P2P runtime integration based around a well managed reusable service design process.
Eh!? Have I unwittingly turned to the dark-side?
What I mean is P2P is OK if the cost of change is minimal – and if we minimise the client specific aspect we can reduce this cost to a point where it’s comparable to that of the alternative of exposing the common model to the wire. In traditional approaches, cost of change is high because the entire design of the solution was hardwired to one specific purpose. Introduce a requirement to flex that solution and we have to rip and replace. However if the P2P ‘design’ is managed correctly and involves the creation of mappings between a common model and the provider domain models, then in addition to exposing that generic interface to the wire, we have a facility to enable the consumers of the service to declaratively derive their own native transformations, which can cut out a transformation step in the runtime.
If we use a toolset such as Progress DXSI, for capturing the Service Provider mappings into the Common model, and then capturing the Consumer mappings into the Common model, then we can relatively simply derive transformations between the Consumer dialect and the Provider dialect. Any changes to the provider, or the Common model would simply require a re-generation of the transformation code that would then execute on the client. This sounds sort of logical…unless my logic has become skewed somehow 🙂
So this hybrid approach simply blends the best of a fully decoupled SOA approach with the runtime efficiencies of a tightly coupled P2P approach, based on the fact that the design-framework is declarative and can reduce the cost of change so as to mitigate the risk of P2P solutions.
I’m going to explore this in more detail, but I’m confident there’s a way of getting the best from both worlds…unless the SOA police catch me first…
The adoption of a common information model is an important consideration within any large scale application integration scenario such as that which underpins Enterprise SOA. The common model is the standardised representation of the key information artefacts moving between domain boundaries, and the inclusion of such an artefact within the SOA infrastructure is a natural decision. In conceptual terms this kind of a approach makes complete sense, where every endpoint maps it’s own localised dialect into central, shared, common model to facilitate a more efficient integration design process than the alternative of negotiating an exchange model with every remote provider around every single interface. On the face of it, whether or not a common model is injected into an integration scenario, a common model will evolve as a by-product of the integration work. As such it’s far better to retain a level of pragmatic control over the evolution of such a corporate asset then to allow natural selection and incremental evolution shape such an important asset. The main problem with the use of a common model is the question of design-time or runtime and the integration methodology applied by the integrators has a bearing over which of these modes can be leveraged.
From experience, however, the problem with a common model can be it’s lack of accessibility and the implications of it’s abstraction on the concrete interfaces engineered through it by local dialects. On the one hand, taking a SOA scenario, one could mandate that all service providers, regardless of internal model and interface technology dialects, expose a single service interface expressed in terms of the common model, upon a single SOA infrastructure blueprint. That way providers have the comfort of sitting all their localised, legacy and evolving assets behind a common interface which becomes the only way to consume that service. Such an approach also implies that providers and consumers can be decoupled during design and delivery to the extent where the service contract derived from the common model would form the basis of a testable component further down the line…
The alternative to this kind of explicit representation of the common model in service interfaces is the adoption and the use of the common model as a design ‘platform’, through which integrators assemble and agree service contracts, which are then engineered as consumer-specific runtime interfaces, albeit expressed as a variation of a fundamental common model. This is not a p2p integration approach, but more of an efficient use of a common model, to facilitate integration through a re-usable service capable of supporting a range of consumer dialects. To achieve this kind of model, there are some pre-requisites in the design-space which, in my experience have been difficult to achieve such that the extent of my hands-one experience is the use of the model as a concrete interface standard. (More on this latter option in a subsequent post).
So when we push SOA services with interfaces represented explicitly in terms of a common, abstract model, there are pain points:
- All endpoints must achieve seamless mapping into the common syntactic and semantic model. Semantics are always the poor relation, and structural mappings are easier to nail than the content-centric domain rules which must also be formalised.
- All service providers must provide an interface presentation with back-end integration into their applications implementing the service logic. Whilst ‘fronting’ an evolving IT stack with new, strategic interfaces is advantageous, the additional ‘layer’ is often seen as an issue although in my experience the overhead of an additional layer of xml processing is trivial compared to the usual latency of executing business logic in the application tier.
- All consumers must conform to a particular, common/non-native dialect when consuming a remote service. This is the primary area where I feel there is justifiable negativity, especially where a consumer of a strategic service may actually be a transient component, targeted for removal in the near future, and as such investment in consumer-side integration kit to facilitate interaction with newly established remote services is difficult to justify.
- Nobody likes a common model…as everybody has to do some work. Moving from a traditional EAI mindset – where we broker all integration solutions centrally, in which case the ‘how’ is hidden within black-box integrations, there is always this debate to be had. However when SOA is simply distributing ownership of common model to the endpoints as opposed to centralising it in an EAI scenario, I feel this point of issue is relatively trivial.
So I believe I can argue and justify a case for a common-model in all cases apart from point 4. The willingness of consumers to ‘take the pain’ of conforming to a foreign model – which may be non-trivial, is a tough nut to crack. Historically I have predominantly been in situations where the consumers of remote services are also providers of strategic services to remote consumers, so investment in new infrastructure to facilitate the ‘provider’ role can be leveraged to support the needs of local applications requiring consume-side adaptation to facilitate interaction with remote, common-model based services.
However I have done so, with an understanding in the back of my mind that there has to be a more effective way of linking strategic services based on a common-model with a diverse collection of consumers and client-side funding and organisational constraints.
In conclusion – using a common model as a concrete interface standard is do-able, but is a pretty heavy-handed and brute-force approach to something which ‘should’ be making life easier. As such I truly believe that a more collaborative framework in the design-space will facilitate a more adaptive integration approach, still 100% supportive of a common model, whilst facilitating low-cost integration. I will be expanding more on this new approach in my next post.
I have been aware for some time that in general, when it comes to the integration of two or more software agents, we seem to have a disproportionate amount of effort across the two critical business integration dimensions – syntax and semantics. The vast amount of modelling and design effort is targeted at the syntax end with DTD, WSDL, XML Schema, along with validating XML parsers across a range of development platforms ensuring we can sniff a missing element at 100 yards. The semantic dimension is to all intents and purposes cast-aside as a mopping up exercise which trickles down the food-chain until it sticks to some poor developer locked in a code-vacuum who’s never even considered what the information actually means as it pings it’s way through his n-th generation-optimised integration stack in 0.002 msecs with 99.999999% reliability !
We’ve all been there…after the integration designs have been signed off….where the XML schemas do a great job of locking down the envelopes through which we then pass combinations of xs:any subtrees and batches of name-value pairs, at which point the real fun starts. Out comes the spreadsheet, with 4 columns:
- Source System Field Name
- Common Model Field Name
- Destination System Field Name
We then fast-forward a couple of months, by which time the integrators and the systems teams hate each other, and we’re running into the max-row limit in Microsoft Excel – but worry not, we managed to get all the mappings sorted….or did we? We may have filled out the syntactical relationships between the elements (if we’re very luck) but we generally fail miserably to capture and manage the meta-data which will enable cleaner integration over subsequent iterations.
It is usually the free-form column that contains the really high-value meta-data, generally in free-form, but a rich vein of valuable constraints and business-rules which rarely manifest higher up the modelling and design food-chain for a number of reasons. It is this meta-data, captured as a by-product of the mapping activity, which we must capture, formalise, version control and publish for subsequent users/developers of the particular interface we’re constructing. We have UML models and GUI mappers for the easy stuff….but at the business-end there are very few semantic integration frameworks – well only one that I’m aware of (Progress DXSI) as a maturing capability offering a lifecycle process as an alternative the spreasheet…..and it enables us to extend our design governance process beyond the creation of static artefacts commonly associated with integration designs. I once referred to this as ‘the year 2 problem’ in a SOA transformation – only really coming to the fore when we begin to work on v1.x interations of our baseline services, but only IF we care about resue and cost-reduction in the design process. Based on the lack of attention in this area I think that assertion may prove a little optimistic…