29 July 2009

Missing ontological serinity in the world of software systems architecture

Updates: See bottom, but also this question on StackOverflow.



Ok, so let me say from the get go that I'm a little bit upset. Well, maybe angry and bewildered more than upset, but nevertheless not happy. And it all has to do with the dingbat way we architecture our various computer systems. So, yeah, quite generic and not really something we can do much about.

Let's rehash. I'm a SOA junkie, an EDA pimp, and I hate by default the bullshit in any Enterpise camp that promotes their way of doing must be right. And by SOA, I don't mean no ESB bullshit, I mean a hard-core focus on services for architectural means. I build ontologically driven systems, and care deeply about semantics where most others don't give a monkey's bottom.

Lately I've had to rehash my knowledge on plugin architectures (both implementation specific and theoretical), how to modularise complex pieces of software, and implement an event-driven platform on which to run my systems. So I've been snooping around, and there's a ton of models and architectures to be found. But being found is not the same as finding what you're after, especially as I have a few criteria to my search; I want to find something that's generic, simple (but not simplistic), elegant (as in, does not suck) and extendable, an architecture that's event-driven, modular and open. Nothing. I've found nothing. Of course they all claim to be amazingly fantastic and super and great, but looking under the hood, if allowed, reveals yet another staticly created shared library stack with some hooks for your software to use, using some misnomer like SOA or EDA or any of the hundreds of other Enterprise bullshit terms out there.

So, I set my goals lower in the hopes of finding anything of value, even went and asked real programmers what I thought was a simple question, making it specific enough to hopefully muster some replies. Nothing. It seems everybody's got their own way to handle their own little piece of the universe, that people cling to their silos of comfort or something, afraid of what might happen if we all agreed on something. Even when you dig into large architectures, like my own Linux Kernel which I'm using to write this post, there's tons of layers and shared libraries that's hubbled together in a way that does the job, ok, but doesn't make it, in my eyes, an easy job to do, elegant to extend or easy to change.

I guess I should clarify. I'm knee-deep in ontology work for software systems architecture, a field that's almost chemically free of any active community, has a few scattered experiements that went no where (and I'm tempted to put ADL in that category, too), a few papers here and there that talks about it in very generic terms (either as abstracts to academic stroke sessions, or a white paper claiming to be the second coming of Jeebus!), but as to hard-core practitioners like me who want to inject a Topic Map with events of given types that matches certain ontological expressions and Topic Map fragments of certain types of architectual patterns, tough! You're on your own, kid.

So, what am I after?

Well, many things, but I'll try to be a bit clear here. I've cut down on my wants, to, in order to try to find others out there doing similar things. So. I'd like to see a simple event-driven software stack that scales ontologically, and isn't bound to any technology, company or otherwise religious platform. This means that the stack with its names and values work just as well for a small plugin as it does for a larger system like an extra-OS or a cloud, works for potato-peelers as well as online booking agents, database connection pools and kernel space memory managers, but also can grow and shrink with need, in such a way that all other parts of it when they need to can find out what those changes are. This digs into creating an upper ontology for information science, of course, but more importantly it means I'd like to plug software into various parts of a stack, so that everything - and I mean everything! - is an event listener. I know some micro-kernels work in similar ways but highly statically bound, but regardless these ideas are way past the cradle stage by now and need to have a greater exploration in the real-world.

So when I download an open-source package of sorts and try to find out what its stack of operation looks like, why is this information so hard to find? Or compare the Java event model and the .Net model. Or OSes. It seems it's very hard to agree on these things, but I doubt the state of things isn't because they've tried and failed, but because they haven't tried. It's a big world and this is a big field, yet this has not been tried in any meaningful way.

Sure, the technologies promoted through OASIS, ECMA and W3C in themselves have various solutions and tries to bind stuff together in a coherent way as not to confuse us too much, but even within their own stacks of proposals and standards there are huge gaps, great leaps of faith, and generally no clear direction. Even W3C who pushes the semantic web movement hasn't got anything to say on the matter. It's starting to drive me bonkers.

Ok, I'm done. My steam has gone out, but I'm not feeling any better. Off to do my own thing, like the rest of them. :)




Update: Ok, it seems I'm not getting my message across. Let me create a simple (and wrong) example ;
  • SOA : Start
  • SOA : Configure
  • SOA : Map
  • ENV : Start
  • ENV : Configure
  • ENV : Map
  • APP : Start
  • APP : Configure
  • APP : Init
  • APP : Connect
  • APP : Perform
  • APP : Teardown
  • ENV : Teardown
  • SOA : Teardown
Here we got an application session events where SOA is, er, SOA, ENV the "environment" (whatever that should mean), APP is an application, and so forth. This list should be HUGE! Think of all the interesting events one could generate from you turn the computer on until someone gets Rickrolled on the other side of the planet! I want to map environments, systems and eco-systems, with labels. In some regards it's an enumerated list of points that any computer system traverses on its path from being loaded into memory until it leaves it. And possibly then some.

I want to map the software system world! I want to know what people call their various points on the software stack, what they call their events, how they see them work together, how they forsee workflow interactions, how they define system integrity, thoughts on implementation, named entities, the works.

I can find heaps of this stuff, but none of it is globally agreed upon, it's all tucked away in projects or companies, it's their own version of how things should be and what happens. Even big players such as Sun / Java and Microsoft / .Net have very different event models and ontologies, and they are not compatible in any meaningful way. I would expect some parts of CORBA had done work in this area, but what I've seen is very transaction oriented where clients already know the ontology and uses CORBA to travel through rather than be defined by.

As an example of the closest I've found so far in the realm of mapping machine-parts ("machine" here is "software systems") is the Open architecture computing environment which tries to define up the most important parts of software systems (although the final version was released in 2004 ... these things can be considered to be final? Where's the clouds?), but lacks the ontological and semantic definition, has no event or message structures or standards, nor does it have any notational value or end-points which, admittedly, I could spend the next couple of weeks doing, but let's see what else is out there.

Making any sense?


Update: To be even more specific, trawling through IPC is really what I have been doing for the last few days, but getting to the core ontology of all of this is soooo painful. Surely someone out there have done something like this? I've even gone through POSIX trying to gleam what nuggets I could find, but the system level of that beast is just so low it's not funny. Promising is the DBus architecture and event stack, but this again is very low-level, covers only a fraction of the software systems, and is littered with duplication of complexities.

Anything else I should hack at? Yes, I've gone through the most of the WS-* stack as well, digging into past knowledge I had hoped to never see again, but here as well as most other technologies out there they seem to be obsessed with being so flexible that they forget to be defining. So, we get a lot of scaffolding and frameworks that you can extend and define your stuff in, but no clear definitions of what the world looks like. Even an obvious contender like WS-Events and the less-know WS-Event from Hewlett-Packard have nothing more than a functional approach to defining and registering events but that's it, leaving the defining to some semi-ontological layer.

But I'm still convinced lots of people have done this sort of work, especially in these Semantic Web haydays. Browsing through the thousands of OWL ontologies in Swoogle for 'software architecture' (which doesn't really cut it, but is the closest term that yield results) leaves me just overloaded. Sure, the OpenGroup SOA ontology for example, does provide me with, eh, lots of interesting stuff, but again it's a special domain (SOA, obviously) using a certain moniker (service orientation, which sucks when you want to define events across operational stacks).

Argh! Can you tell I'm going bonkers?

Labels: , , ,

7 September 2007

REST and SOA as a process for application design

I'm going to stray a bit from the library theme, and talk about design of RESTful SOA. It's a topic close to my heart, as most SOA talk these days are full of vendors claiming money can buy you not only love, but immortality. With SOAP? Hah!

No, I think reinventing what the Web does really well already is a) a waste of time, b) doomed to make a bad copy (as the web is constantly moving, while the SOAP / WS-* stack is immersed in slow-moving standards), and c) over complicating things (I like elegant simplicity such as the innards of the Web).

REST

Roy Fieldings' REST dissertation has swooped upon the middle and higher layers of the IT world lately, making a lot of them admit that, perhaps, this whole deal about using HTTP and loose XML (often XHTML) to create scalable, fast, simple and dynamic applications (well, as an architectural style, to be specific) might have something going for it. REST has been around for a long while, being the very fabric of what the internet is based on, slowly extended and refined over the last years 15 years (even though a lot of these concepts are again based on earlier technology).

Service Oriented Architecture (SOA) is a little bit tricker to define, especially these days when big corporations have discovered and use it as a buzzword, but basically it is technical architecture creating loosely coupled (meaning; the items in question knows very little of each other) services, and where a service is a piece of software that some other piece of software might use (as opposed to direct human usage). Now, a lot of people already talk about this stuff, so I'm not going to add to that. I'd rather talk about what I think when I do this stuff, to talk about actual implementation.

Working in both these two worlds, putting them together to design and create applications, is quite different from the normal software development processes that's so popular these days. The most striking difference is that during application design you think in terms of resource orientation (as opposed to object orientation, or functional design) and how to represent services (as opposed to a program, or a module).

You can either plan a big-bang approach to this (standard waterfall models) per service, or you timebox a more agile approach of creating one or several services that does the simplest thing needed to service your proposed application. The world spins around the axis of identifying application to solve problems; let's turn things around (and this is a big part of SOA) and see if we can come up with services that solves problems instead.

Typically you have a sleigh of applications that all have common functionality, such as user management, database storage, configuration, session handling, search and a few other bits and pieces depending on the business you're in. There's many ways to deal with reuse of these "things", and I deliberately call them "things" at this stage, because as soon as you call them "modules", or "libraries", or "reusable code" you're setting the scene for quite implementation specific stuff, such as what language you're going to use, or what platform it runs on. I don't want to deal with "libraries" for example, because if some library is written in Java then I need to make my other solutions in Java, too. If I have a "module" that does X in Windows using C#, the chance that "module" is linked to that technology is quite high.

Things

No, I want to talk about "things". For example, let's talk about users. A lot of applications deal with users in some way or another, whether it's displaying information about them, for them, authenticating them, create properties on them, or otherwise work with their user data. How can we create a service that applications might have good use for?

Since we meddle here in all things REST, the first thing we do is to think of the service in terms of resources (as being resource oriented is extremely important; expose URIs for every resource, as small / atomic as need be). I usually create two sub domains to hold services, one for internal behind the firewall services (soa.domain) and one external (ws.domain; 'ws' for web services), and I also try to have a trim set of basic elements that express generic functionality (search, user, session, database, properties, etc) wrapped in an even smaller and more generic set of domains (x, y, z, a, o, a, etc.). Through this, the first part of my design process is to play around with URIs and hierachial taxonomical ideas to see what feels right ;

http://soa.example.com/identity/user[/{userid}]
http://soa.example.com/user[/{userid}]
http://soa.example.com/user/id/[/{userid}]
http://users.soa.example.com/{userid}

Balance this with ideas on premature optimization (what, you thought that was axiomatically bad? It's allowed to think about these things, you know :) in terms of request times for a domain (the more domains involved in a series of calls, the longer the overall response time, generally speaking) and what feels right.

In my case, the first one seemed the most right. I've developed a small set of root categories in which I "place" my services, such as /search, /publishing, /identity, and so on. These categories are not canon; they are placeholders for loose ideas and thoughts, bound to change in the future as your SOA evolves.

Evolution

Evolution in your SOA is very important, so you should design for it in mind. For example, what about version control of services? Some talk about versioning being part of the XML schemas that services deliver, others talk about content negotiation (crazies :). I take a rather pragmatic and somewhat naughty approach (in the sense that you shouldn't put semantics in your URL's which humans will look at and try to pry apart and use / misuse) and put versioning into the URL at the base of the service defined. For example ;

http://soa.example.com/identity/user/v1[/{userid}]

I also set a rule to service development ; maintain backwards compatibility as far as you can. There's no need for an ever update to the version number if you design your XML schemas that pass through them in smart ways, and this reduce the overhead of deployment, introspection and dependency. Another rule to service digestion is to only react to what you understand, and ignore all that you don't; this again enables backwards compatibility as you, say, add a new (but non-critical) element to your metadata which older service users don't understand and simply ignore.

For proper development of a RESTful SOA, though, I'd suggest two things as a minimum ;
  1. use test-driven development for the service definitions (and use whatever methodology you like for the actual code for the service, although test-driven there too won't hurt you), so write your tests for your service (I use XPath with XSLT scripts for this) first and then develop the actual service until it passes all tests, and
  2. collect your services' tests into a large test suite ; whenever you add, subtract or change a service, make sure all tests pass. (If you can sneak this into a build farm of sorts, all the better. Automation for this type of development will probably save a lot of gray hairs) Through this you know what breaks and what's backwards compatible with your changes across the whole SOA. Don't deploy anything from development into test or production unless all tests pass. This is not a trivial task, and should be in the hands of someone who is full-time responsible for the SOA's well-being.
Now, in evolution of SOA's as well as in nature, don't be afraid of screwing things up. We don't want perfection. We will never get perfection. And we certainly won't get anything near it in the first go. All these services must be allowed to change over time, dramatically at first, even to the point of deleting it completely, and start from scratch making something different. (In fact, I'd advocate making all these first-generation services with version number /v0-ALPHA/ in all caps, as in http://soa.example.com/identity/user/v0-ALPHA[/{userid}] ; this will mark them as experimental and trigger other developers to tread gently. If they worked great, just update them to a /v1/ version)

Time management of this development is also important. Because services must be allowed to break, be allowed to screw up, we must also allocate time for these screw-ups to happen. Trust me, it's a good thing ; a smaller failure now ensure we don't screw up big time later. (And this very point is probably the cause of so much bad management and so many failed [enterprise] projects as it's very easy to overlook or not taken seriously enough. I can write a whole book on this topic alone!)

And people who have some sort of ownership of a service (as developers, or analysts, or whatever) must be given time for short iterative development, for little updates, modifications and tweaks. Services won't be successful if you treat them as small bangs (meaning; gather requirements, write spec, make it, sign it off), and probably only can work through continuous tinkering. Such tinkering doesn't have to be time-consuming nor difficult to manage, but it does require you to plan for it. When Bob goes on to his next project, remember that he's also needs a half-day per week to tweak and fiddle with his service.

Introspection

One feature that I can't emphasize enough is service introspection, an area that most writers I've seen gloss over. And sure, you don't need it in order to create a SOA or a web service. But I'll assert that you need one if you're a) smart and b) want to create a healthy SOA that can stand the test of time.

Introspection in my world does three important things ;
  1. Handle the client state through hyperlinks (part of the REST paradigm)
  2. Documentation of interface, use and dependencies
  3. Provide test suite
Asking a service for introspection in my world goes something like this ;

http://soa.example.com/identity/user/v1?introspection

or, if you want to split the three up ;

http://soa.example.com/identity/user/v1?introspection=state
http://soa.example.com/identity/user/v1?introspection=docs
http://soa.example.com/identity/user/v1?introspection=tests

1. Handling state of a client through hyperlinks is a somewhat forgotten part of REST, which is easy to miss when your design is at an early stage (and it usually stays that way because you don't think you need it by the stage you're made aware of it). It basically comes down to either URL-driven or FORM-driven hyperlinks that takes you from whatever state the current URL gave you to the next one. For example, a resource soa.domain/search?q=fish might give back a list of URL's to pages of results, or a form to do a sub-search, all documented through hyperlinks. I personally think the use of XHTML is good for this, but a bit more formal and equally elegant is the use of the Atom Publishing Protocol (not to be confused with the Atom Syndication Format).

2. Documentation is important, and could be as easy as just returning an XHTML page with some text about what it is, how to use it, and so forth. However, I see a major part of documentation as to what dependencies the service has got, so I've got a section that looks a bit like this ;

<ul id="SOA-dependencies">
<li><a href="http://soa.domain/some_service/v2">Some service</a></li>
<li><a href="http://ws.google.com/wdsl/service/1.0">Some Google service</a></li>
</ul>

Notice that this is perfect XHTML. All that's required to understand this list is understanding the identifier for the list, the "SOA-dependencies", which I can locate easily through DOM or XPath. Through this mechanism in services you can now map the whole dang thing, plot in your dependencies, check it against your test suite (talked about earlier) for ultimate coolness and power.

In this section I might add that I often incorporate a ping parameter which testing and monitoring systems can use to check the health of a running SOA, something like ;

http://soa.example.com/identity/user/v1?ping

or, if you've got the RESTful chutzpah required, use the HTTP method OPTIONS instead of a GET on a URL. I actually do both. The HTTP response code hence talks about the generic health of the service as far as it knows, and you can use this info not only for monitoring and testing, but also for automatic systems and smart clients.

3. It may seem a bit strange to ask a service to give you a test-suite, but it actually is a very encapsulating and clever thing to do, making sure that tests are all handled at the same place where development takes place. I can do ;

http://soa.example.com/identity/user/v1?introspection=tests

and I'll get back something like this ;

<testlist>
<test name="My first test"
href="http://soa.example.com/identity/user/v1/2456325786234985"
xpath="/response/item[@name='user']/id"
is-true="2456325786234985" />
... [more tests here]
</test>

Basic test-case skills are probably a plus at this point to understand what this is about, but basically we assert that the XML/XHTML that the URL returns will give the result "2456325786234985" when the XPath expression "/response/item[@name='user']/id" is run.

Your testing framework for the SOA simply collects these test files at intervals to build a larger test-suite that stands as the controller for the whole system.

Finally

Just a few finishing thoughts about rigidity, complexity and management of a RESTful SOA ;

If you don't have dedicated SOA people, then don't do it. If your people (developers, analysts, managers) aren't very flexible, then don't do it. If you don't understand REST, either really learn it (this book is the best there is on this subject!), or don't do it. If you think you need complex systems, don't do it. If you can't wrap your head around resource-orientation, then don't do it.

The thing is, you can perfectly well live without it, create SOA or some other well-meaning version of that concept with SOAP/WS-*/BPEL/ESB or whatever big vendors are more than happy to help you with. You can create POX services just fine. You won't be RESTful, but you will probably survive without it. You don't need it in as much as you can live on only water and bread for years and years, but of course I wouldn't recommend it. :)

Anyways, a few thoughts there on RESTful SOA design and implementation. I haven't digged into the semantics of modeling a full SOA yet, nor talked much about pipeline XML schemas (although the APP protocol is a good hint), system introspection through things like WADL, or even the hidden benefits of ROA (resource-oriented architectures). So. More to come, then. Until then, happy hacking.

Labels: , , ,

28 March 2007

Quiet, oh so quiet ...

For those who wonder why I've been so quiet of late should go read this Lorcan Dempsey bit. Now you also know why I might be a bit quiet for a little while as well, as there's big deadlines and interesting stuff coming up. Watch this space.

Labels: , , , ,