2 June 2009

Successful crap

It never ceases to amaze me peeking into various successful open-source projects, seeing the innards, and wonder how they even got this crap code past their own pride. Yes, I need to vent.

This weekend I was head-down in various content-management systems and their ilk, digging into anything from WordPress to Habari to Joomla to Simple CMS (which you would expect be simple) eZ publishing. All of them had rather abysmal code scattered throughout (with eZ publishing being the better of the lot), oddities, and all the worts you'd expect of systems hobbled together, where their success is more an afterthought.

But hang on, I can't criticise systems for their organic growth. But I can criticise them for not doing much about the trouble that comes from it. Sure, I understand that rewriting core parts of a system requires a huge ego and nerves of steel, and I understand how "if it ain't brok, don't touch it" rules the end of the day, but surely the end means is good software, right?

But.

It's crap. It's rubbish. And more importantly, as I can put up with crap if it gives me opportunities and love, it hinders innovation, flexibility and, well, love.

Let's pick on a random contender that I worked heaps with last week, WordPress. If we cut away comments, this is its index.php ;
define('WP_USE_THEMES', true);
require('./wp-blog-header.php');
If we look into wp-blog-header.php, here's what we get ;
if ( !isset($wp_did_header) ) {
$wp_did_header = true;
require_once( dirname(__FILE__) . '/wp-load.php' );
wp();
require_once( ABSPATH . WPINC . '/template-loader.php' );
}
Ok, so let's peak into wp-load.php, and we find about 20 lines of code and more includes. Don't you just love playing hide and seek with files to find out where it's going and why? These things maybe have come around from shuffling the organic growth of the system into other files, and left them with these sad little snippets that's hard to get an overview of and takes a slight performance toll, too, as well as eat away your sanity and good programming ethics. And they all contain that newbie error of putting this at the end of every business logic file ;
?>
It's not needed, and if you like your whitespace under reasonable control, you're stuffed. Not a big crime, mind you, but just one of those niggling little things. Then you've got functions and objects, some with the wp_ prefix, some without, business logic in files called wp_settings.php, repetition of code everywhere, hundreds of DEFINE's scattered about in different files, and so on and so on. (And yeah, I should contribute as it is open-source and all that, and I'm actually writing a embedded Topic Maps engine as a plugin, so we'll see)

But I'm not here to pick on WordPress per se. It's more about how this organic growth hinders innovation and opportunities. So let's talk about frameworks. All little pieces of code together form a framework, so we're not necessarily talking about a framework of disjoint classes or functions that aid developers making stuff like the Zend Framework, or Cake, or Symfony, or CodeIgniter, or 1000 others out there. Well, kinda; all those little things the app is made from also constitutes a framework, but it isn't disjoint nor refactored or synergetic or stable or well thought-out as you get in a more established framework, but never the less that's what it is.

PHP itself is a framework, of course, and most PHP frameworks are wrappers and added code to make PHP act more like a coherent system, fixing inaccuracies and bugs and niggles, stabilizing behaviour and increasing the need for spending hours and hours learning some new paradigm you can't use elsewhere.

Hmm. Where was I? Oh, right; every app is a framework. But when the framework isn't a perticulary good one, where the pieces are either too fragmented or too disjoined to make any sense, making stuff in that framework is going to be a pain. Like WordPress is a pain. And before you know it we get to the next rewrite, and this time we'll get it right, although we need to keep our legacy intact, and hence we write hacks on top of fresh code to drag it back to the hole it came from. The data model needs to rewritten, but it won't because "well, it works, doesn't it?" and the framework needs to be rewritten, but it won't because "well, it's not broken!"

When I can't replace MySQL with something else, that's a hindrance. If I can't change the way tagging works, I can't move forward. If I can't change the URI handling, I'm stuffed. If I can't use portions of it to write something else, nothing new will come. Sure, there might be a plugin architecture somewhere, perhaps a simple event model that one can tap into, but if I can't replace the model in which it operates I can't make it more beautiful. I am forced to accept the model and framework in which WordPress sits.

And it sits quite squarely on top of everything I want to do. I want to create better and typed links, I want to reuse a model for sequences and storage, I want to replace tags with guided controlled vocabularies (maybe even typed and binary linked to external WordNet sites), I want to use it as a CMS and skip the URI handling alltogether, and so on. But I can't, because WordPress wasn't designed with change and innovation in mind.

But a day will come when even the most successful project will face its own innards. And some people will branch it, some will stay on, some will create something new, and some will stop using it alltogether. And it's all a really good thing; this is organic growth, it's a framework that spawns other frameworks. And more crap will be successful. Things will be broken and ugly and hackish, just like some things hopefully won't.

And in a few iterations, something beautiful - with a probably different name - will emerge.

Update: It would be great to get your suggestions for open-source projects which are designed for change and has quality and / or elegant code to boot. Let's make a list!

Labels: , , ,

20 March 2009

Ressurection : xSiteable Framework

I've just started in my new job (yes, more on that later, I know, I know) and was flipping through a lot of my old projects over the years, big and small, and I was looking for some old Information Architecture / prototyping tool / website generator application I made with some help from IA superstar Donna Spencer (nee Maurer) back when I lived in Canberra, Australia.

I found three generations of the xSiteable project. Generation 1 is the one a lot of people have found online and used, the XSLT framework for generating Topic Maps based websites. I meant to continue with generation 2, the xSiteable Publishing Framework (which runs the Topic Maps-based National Treasures website for the National Library of Australia) but never got around to polishing it enough for publication, and before I came to my senses I was way into developing generation 3, which I now call the xSiteable Framework (which sports a full REST stack, Topic Maps. And yes, I'm still too lazy to polish it enough for publication (which includes writing tons of documentation), at least as of now, but I showed this latest incarnation to a friend lately, and he said I had to write something about it. Well, specifically how my object model is set up, because it's quite different from the normal way to deal with OO paradigms.

First of all, PHP is dynamic, and has some cool "magic" functions in the OO model which one can use for funky stuff. Instead of just extending the normal stuff with some extras I've gone and embraced it completely, and changed my programming paradigms and conventions at the same time. Let's just jump in with some example code;
// Check (and fetch) all users with a given email
$usercheck = $this->database->users->_find ( 'email', 'lucky@bingo.com' ) ;
Tables are are contextually defined in databases, so $this->database->users points directly to the 'users' table in the database. (Well, they're not really table names, but for this example it works that way) The framework checks all levels of granularity, and will always return FALSE or the part-object of which you want, so for example ;
// Get the domain of a users email address
$domain = $this->database->users->ajohanne->email->__after ( '@' ) ;
Again, it's like a tree-structure of data, a stream of granularity to get in and out of the data. This does require you to know the schema (and change the code if you change the schema), but apart from that, in a stable environment, this really is helpfull (it's also cached, so it's really fast, too).

You might also have noticed ... users->ajohanne->email .... Where did that "ajohanne" bit come from? Well, as things are designed, again the framework will try to find stuff that isn't already found, so "ajohanne" it will automatically look up in designated fields. All objects that extend the framework have two very important fields, one being the integer primary identifier, the second one the qualified unique name (so not a normal name as such, but a most often a computer generated one that isn't normally a number. Often systems will use things like a username, say, as a qualified name, and hence "ajohanne" was my username in one such system). Why do this?

Well, PHP is dynamic, so in my static example above, explicitly using 'ajohanne' as part of the query, isn't the best way to go in more flexible systems, but just pop your found user in dynamically instead;
$domain = $this->database->users->$username->email->__after ( '@' ) ;
Easy. And this applies to all parts of the tree, so this works as well ;
$domain = $this->database->$some_table->$some_id->$some_field->__after ( '@' ) ;
No, from the two examples above we might see a different pattern, too. All data parts has unrestrained names, all query operations use an underscore, and all string operations uses two underlines. (__after is a shortcut for substr ($str, strpos ( $str, $pattern ) ), and I've got a heap of little helpers like this built in ) Through this I always know what the type of the object interface is, and with PHP magic functions these types are easy to pull down and react to. As some of my objects are extendable, I need to pass _* and __* functionality up and down the object tree.

Traditionally, we use getters and setters ;
$u = $obj->getUsername() ;
$obj->setUsername ( $u ) ;
I turn them all into properties, so ;
$u = $obj->username ;
$obj->username = $u ;
But they are still full internal functions to the object, and this is another magic function in PHP ;
class obj extends xs_SimpleObject {
function getUsername () {
...
}
function setUsername ( $value ) {
...
}
}
The framework isn't just about object persistence. In fact, it is not about that. I hate ORMs in the sense that they still drag your OO applications back into the relational database stoneage with some sugar on top. In fact what I've done is to implement a TMRM model in a relational database layer, so it's a generic meta model (Topic Maps) driving that backend and not tables, table names, lookup tables, and all that mess. In fact, crazy as it sounds, there's only four tables in the whole darn thing. I'm relying on backend RDBM systems to be good at what they should be good at; clever indeces, and easier joins in a recursive environment (which, when all data is in the one table, it indeed is recursive), where systems use filters to derive joins instead of doing complex cross-operations (which takes lots of time and resources to pull off, and is the main bottleneck in pretty much any application ever created which has a database backend.

A long time ago I thought that the link between persistent URI's for identity management in Topic Maps and the URI (and using links as application state) in REST were made for eachother, and I wanted to try it out. In fact, that fact alone was the very inspiration for me to do the 3rd generation of xSiteable, hacking out code that basically has one URI for every part of the Topic Map, for every part of the TM API, and for other parts of your application. Here's some sample URIs ;
http://mysite.com/prospect/12
http://mysite.com/api/tm/topics/of_type:booking
http://mysite.com/admin/db/prospects
At each of these there are GET, PUT, POST and DELETE options, so when I create a new prospect, it's a POST to http://mysite.com/prospect or a direct PUT to http://mysite.com/prospect/[new_id], for example.

All in all, this means I have many ways into the system and its data, none of them more correct than the other as they all resolve to topics in the topic map. This lowers the need for error checking greatly, and the application is more like a huge proxy for a Topic Map with a REST interface. It's a cute and very effective way of doing it. I'm trying various scaling tests, and with the latest Topic Maps distribution protocols that I can use for distributing the map across any cluster, it's looking really sexy (although I still have some work to do in this area, but the basics rock!).

Anyway, there's a quick intro. I guess I should follow this up with some more coded details of examples. Yeah, maybe next week, as I need to get some other stuff done now, but I like the object model I've got in place, and it's so easy to work with without losing the need for complex stuff. Take care.

Labels: , , , ,

2 July 2008

Just enough to make some sense

I've realized that my previous post on language and semantics could possibly be a bit hard to understand without having the proper context wrapped around it, so today I'll continue my journey of explaining life, universe and everything. Today I want to talk about "just enough complexity for understanding, but not more."

Mouses

Let's talk about mouse. Or a mouse. Mice. Let's talk about this ;

One can argue whether this is really enough context for us to talk about this thing. What does "mouse" mean here? The Disney mouse? A computer mouse? The mouse shadow in the second moon? In order for me to communicate clearly with my fellow human beings I need to provide just enough information so that we can figure this out, so I say "mouse, you know the furry, multivorus, small critter that ..." ;


This is too much information, at least for most cases. I'm not trying to give you all the information I know about mice, but just enough for me to say "I saw a mouse yesterday in the pantry." Talking about context is incredibly hard, because, frankly, what does context mean? And how much background information do I need to provide to you in order for you to understand what I'm talking about?

In terms of language "context" means verbal context as words and expressions that surrounds a word, and social context as the connection between the words and those who hear or read them based on the human constraints (age, gender, knowledge, etc.) There's also some controversy about this, and we often also imply certain mental models (social context of understanding).

In general, though, we talk about context as "that stuff that surrounds the issue", from solid objects, ideas, my mental state, what I see, what I know, what my audience see and knows, hears, smells, cultural and political history, musical tastes, and on and on and on. Everything in the moment and everything in the past in order to understand the current communication that takes us to the future.

Yup, it's pretty big and heady stuff, and it's a darn interesting question; how much context do you need in order to communicate well? My previous post was indeed about how much context we need to put into our language and definition in order to communicate well.

A bit of background

Back in 1956 a paper by the cognitive psychologist George A. Miller changed a lot of how we think about our own capacity for juggling stuff in our heads. It's a most famous paper, where further research since has added to and confirmed the basic premise that there's only so much we're able to remember at the same time. And the figure that came up was 7, plus / minus 2.

Of course that number is specific to that research, and may mean very little in the scheme of more specific settings. It's a general rule, though, that hints to the limits we have in cognition, in the way we observe and respond to communication. And it certainly helps us understand the way we deal with context. Context can be overly complex, or overly simple. Maybe the right amount of context is 7, plus / minus 2?

Just right



I'm not going to speculate much in what it means that "between 5 and 9 equally-weighted error-less choices" defines arbitrary constraints on our mental storage capacity (short-term especially), but I'll for sure speculate that it guides the way we can understand context, and perhaps especially where it's loosely defined.

We humans have a tendency to think that those things that looks like the truth must be the truth. We do this perhaps especially in the way we deal with computer systems, because, frankly, it's easy to define structures and limitations there. It's what we do.

An example of this is how we observe anything as containers that may contain things, that in themselves might be containers which might be things or more containers, and so on. Our world is filld with this notion, from taxonomies, to object-oriented programming, to XML, to how we talk bout structures and things, to how science was defined, and on and on and on. Tree-structures, basically.

But as anyone with a decent taxonomic background knows, taxonomies don't always work as a strict tree-structure. Neither does anyone who's meddled in OO for too long. Or fiddled with XML until the angle-brackets break. These things looks so much like the truth that we pursue them as truth.

things are more chaotic than we like. They're more, in fact, like graph structures, where relationships between things go back and forth, up and down, over and under already established relationships. It can be quite tricky, because the simple "this container contains these containers" mentality is gone, and a more complex model appears;


This is the world of the Semantic Web and Topic Maps, of course, and many of the reasons why these emerging technologies are, er, emerging is of course because all containers aren't containers at all, and that the semantics of "this things belongs to that thing" isn't precise enough when we want to communicate well. Explaining the world in terms of tree-structures puts too many constraints on us, so many that we spend most our time trying to fit our communication into it rather than simply defining them.

We could go back to frames theory as well, with recursive key/value properties that you find naturally in b-trees, where values are either a literal, or another property. RDF is based on this model, for example, where the recursiveness is used for creating graph structures. (Which is one reason I hate RDF, using anonymous nodes for literals)

Programming languages and meta models

Programming languages don't extend the basic pre-defined model of the language much. Some languages allow some degree of flexibility (such as Ruby, Lisp and Python), some offer tweaking (such as PHP. Lua and Perl), while others offer macroing and overloading of syntax (mostly C family), and yet more are just stuck in their modeling ways (Java). [note: don't take these notions too strictly; there's a host of features to these languages that mix and match various terms, both within and outside of the OO paradigm]

What they all have in common is that the defined meta model is linked to shifting bits and bytes around a computer program, and that all human communication and / or understanding is left in the hands of programmers. Let's talk about meta models.

Most programming languages have a set of keywords and syntax that make up a model of programming. this is the meta model; it's a foundation of a language, a set of things in which you build your programs on. All programming languages have more or less of them, and the more they have, the stricter they usually are as well. Some are object oriented languages, other functional, some imperative, and yet other mixes things up. If I write ;

Int i = new Int ( 34) ;

in Java, there's only so many ways to interpret that. It's basically an instance of the Integer class, that holds the integer number of 34. But what about

$i = new Int ( 34 ) ;

in PHP? There is no built-in class called Int in PHP, so this code either fails or produce an instance of some class called Int, but we do not know what that means, at least not at this point. And this is what the meta model defines; built-in types, classes, APIs and the overall framework, how things are glued together.

As such, Java and .Net has huge meta models defined, so huge that you can spend your whole career in just one part of it. PHP has a medium meta model, Perl even smaller, all the way down to assembler with a rather puny meta model. Syntax and keywords is not just how we program, but they define the constraints of our language. There's things that's easy and hard in every language, and there is no one answer to what the best programming language is. They all do things differently.

The object-oriented ways of Java differ to the ones of Ruby which differs to the ways of C++ which differs to the ways of PHP. The functional ways of Erlang differs to XSLT which differs to Lisp.

The right answer?

There is no right answer. One can always argue about the little differences between all thse meta models, and we do, all the time. We bicker about operator overloading, about whether mutliple inheritance is better than single inheritance, one the real difference between interfaces and abstract classes, about getter and setter methods (or lack thereof), about types should be first class objects or not, about what closures are, wheter to use curly-brackets or define programming structure through whitespace, and on and on and on.

My previous post was another way of saying that we perhaps should argue less about the meta model of our language, and worry more about the reason the computer was created more than how a certain problem was solved? We don't have the mental capacity to juggle too much stuff around in our brains, and if the meta model is huge, our ability to focus on perhaps the important bits become less.

There are so many levels of communication in our development stack. Maybe we should introduce a more semantically sane model into it to move a few steps closer to the real problem, the communication between man and machine? I'm not convinced that OO nor functional programming solves the human communication problem. let's speculate and draw sketches on napkins.

Labels: , , , , , , ,

24 June 2008

Thoughts on PHP

Yes, yes, I admit the herasy; I like PHP. No, no, PHP has tons of worts with it, so no, it's not better that alternative X, Y or Z for task Q, W or E. I hate comparing languages this way, feature by feature, syntax by semantics, and so on. I like to judge languages on two things;

Environment

I really like the CGI style for web resources. No, PHP ain't CGI (unless you're in a pain-self-inflicting mood) but most often a glorious Apache module which reuse all the goodness it can offer, but the model of a totally independent scripting engine which needs to mold its relationships, and then you throw it away when you're done, makes for a clean, fast and very scalable framework. PHP basically compiles together mostly C modules, and use its simple syntax to glue stuff together. Yes, there's a backlog of really shitty and badly written code out there, especially when people have no clue. And when the threshold is as low as it is for PHP, that's inevitable, but I hardly hold that against the language itself. The environment in which that shitty code runs is really good. To me, the environment is the best part of PHP. Perl falls somewhat into this same category.

For those in the know, this model is very similar to how a RESTful system works, where the interpreter is a manifestation of a resource. In resource-oriented development this means gold, and is very important to me as my environment supports my RESTful way of designing systems.

Style

And since style is something any good developer can control themselves (unless their language is super-strict), this really is the main thing that I like about PHP; It gives me the freedom to make things work for me. I can choose the methodology, whether a function, class or inline is best suited, and since PHP always is evaluated in run-time, I can make my environment depend on real-time parameters rather than a pre-compiled utopia.

Zend Framework

I've been using the Zend Framework for the last year or so, and it's a great framework as such, although the focus has been mostly to put OO wrappers around common PHP idioms and conventions. As such it works great to perhaps consolidate the features, and perhaps give PHP 6 a future direction. This is the best part of the Zend Framework.

The bad thing about the Zend Framework is that it imposes its own style, and somewhat alters the environment. Ouch, on both the things that make PHP my choice tool. I've struggled quite a bit in trying to reuse certain parts of the framework, extend others, and generally use some bits without using the whole shebang. There's not a lot of dependencies between the components, but just enough to make it tricky to do serious stuff (for example writing an alternative "threads-like" HTTP adapter to HTTP Request).

Now, people talk about the Zend Frameworks impact on the future of the PHP language. Yes, one can always hope that this is the case, but I think people are a little too preoccupied with the OO capabilities and forgetting perhaps what makes PHP really popular with those who choose to continue to use it past their first few applications. Do not fall into the trap thinking OO is somehow better than other ways.

As much as people think MVC is the best thing, I really don't care about that. MVC works great for some things, not for all (for example, I have a REST framework that use a completely different model, a more resource oriented model), so to impose MVC as the modus operandi is not good, and indeed something that makes reuse of other ZF modules a bit trickier. Instead of a MVC focus, there should be a strong highlighting of a uniform interface to the environment itself. It's the environment that's cool with PHP, so let's make interfacing to that better. There's some work on wrappers and readers for various aspects of the HTTP protocol, but only the most basic stuff is in there and needs serious work. With PHP I could do serious HTTP applications; with Zend Framework I'm limited, and need to hack and extend. Let's get HTTP savvy, not MVC drones.

Availability and diversity

PHP is everywhere. Period. I can use pretty much any ISP or in-house hosting to host most of what I need. And there are so many different open-source projects about that uses PHP (WikiPedia, WordPress, PHPBB, PHPMyAdmin, Drupal, Joomla, Flickr ... the list goes on) meaning both hosting and the amount of high-quality tools are abundant. The LAMP stack is pretty much supported on every hardware and software platform. Heck, I can even run it on the JVM.

I have written many tools over the years, some good, some bad, but I'm always happy to find out that I can copy my old files into any new environment and they will pretty much always run straight away, no tweaking. I can't tell you how important this is!

The shitty stuff

No system is perfect, to some specific definition of what "perfect" means. With PHP, I've learned to live with its odd bits, such as funny booleans and comparators, diverse and non-uniform APIs, sloppy exception handling, no shared memory (which may or may not be a good thing, but certainly a big part of its design), and a few syntactic snuffs (Why not introduce "$$" as a shortcut for "$this->", for example? Tidbits).

Some say that PHP code is crap, usually said when encountering - indeed! - shitty code. But shitty code is everywhere. Even in the most tied-down, static and controlled environments you still get shit. Some say less shit. I say shit with a different color, not different odor. I've been programming for almost 30 years soon, and let me tell you, I've encountered shitty code in every single environment and language I've ever tried, and I'm one of those who tries every language out there. There is no language who can hold a stick to the wonderful imaginitive mind of man, able to lay bricks where diamonds should have shone.

Some say the PHP language itself is immature, or unstructured, or some other parameter that their favourite language holds. No, sorry, but there's not that much which separates all our different languages (except BrainFuck ... that one is seriously different. :), and indeed a lot of the language wars are really more about their API designs than the languages themselves. For example, the syntax of Java is tolerable, but a lot of its APIs are not. The syntax of Python is ok, but some of its APIs are great. Ruby has good syntax, but to me some confusing APIs.

It's really mostly about APIs, and how we create models to solve our problems. It's all about models, as APIs in themselves are models. The API is not the language.

Round off

I should here of course mention the link between models and Topic Maps, as the latter is a way to define, work with and exchange / share the former. Couple this model openness with resource-orientation, and I think it makes for a very interesting environement. But as this whole shebang is part of the new framework I'm working on, expect more blogging on this in the very near future. Have a wonderful week.

Labels: , , ,