Friday, 11 December 2009
I was struck by his certainty that such a theory is possible. He seemed to be making a scientific prediction based on an aesthetic sensibility - unified theories are more beautiful therefore a unified theory is correct.
Unification is certainly an important part of the progress of a science. Occam's razor is a well-established principle for judging the utility of a theory. Entia non sunt multiplicanda praeter necessitatem - entities are not to be multiplied more than is necessary. In other words, the simplest theory that fits the data is the best.
When Sadi Carnot showed the equivalence of heat and mechanical work it was a victory for physics because physicists could now explain natural phenomena using less entities. At a superficial level, his discovery was useful because students now had to tax their brains with less concepts in order to understand both heat and motion.
But Carnot's unification had also produced a theory was more correct than earlier theories that thought of heat as a substance called "caloric". The mechanical theory of heat turned out to explain more phenomena than Carnot had originally considered. The laws of thermodynamics could not have been formulated without Carnot's insight.
Why should the simpler theory prove more correct?
Marcus Hutter believes that understanding is fundamentally an act of mental unification. The Hutter prize offers a reward for anyone able to produce a better compression of Wikipedia. A high compression ratio requires a deep understanding of the corpus - in this case a snapshot of human knowledge. If unification is in some sense equivalent to understanding then a unified theory is more likely to be correct because it is a better approximation of the phenomena in question.
(Interestingly, Hutter is also an advocate for a physical Theory of Everything)
However, it is a leap of blind optimism to assume that a G.U.T. is possible just because if it existed it would be useful, beautiful and likely to yield futher insight. Desirability does not imply feasibility.
In The Mythical Man Month Fred Brooks draws a distinction between essential and accidental complexity in software systems that is very pertinent to the possibility of a G.U.T.
Accidental complexity is caused by defficiencies in the solution. This kind of complexity can be eliminated by improving the approach to the problem. I would argue that the seperation of heat and mechanics was an example of accidental complexity caused by a lack of understanding of the nature of heat. The mechanical theory of heat was a successful simplification because it removed complexity that was never part of the phenomena itself.
Essential complexity, on the other hand, is inherent in the problem. It is impossible to build a solution that is less complex than the problem it is designed to solve.
The universe, like any other corpus, has an uncomputable Kolmogorov complexity that limits how simple a correct theory of physics can be. Though we cannot ever know the essential complexity of the universe, it does have one. There is an unknown and absolute limit to the unifying efforts of physics, so we cannot ever be sure that further unification will be possible.
Perhaps physics will encounter new phenomena that require new multiplication of entities to explain. Perhaps we are close to the limit and though we might incrementally simplify our theories we will never be able to reduce physics to less than four fundamental interactions.
We cannot hope to make out theories more unified than the phenomena they describe and still hope to make them correct. As Albert Einstein said (my italics):
Thursday, 22 October 2009
If a larger portion of this torrent of code could be reused then an enormous amount of effort could be saved. Perhaps this effort could be diverted into improving the software quality and we could finally make a dent in the software crisis.
But though everyone has been talking about code reuse for decades, there has been very little progress.
The code that has enjoyed a significant degree of reuse has been specifically designed for that purpose. Frameworks, libraries and plugin architectures are widespread. Even the mighty operating system exists to share functionality between applications. But serendipitous reuse of code that was originally designed to solve a singular problem is rare.
I think that the reason that code reuse is hard is the same reason that the semantic web has failed to materialise. This makes sense, because code is just a particular kind of semantic content.
As Clay Shirky has argued, the the semantic web is a problematic ambition because it requires a universal worldview. The semantic web project envisages that information interoperability will be achieved by employing universal data formats. But data formats are contingent on worldview, which can never be universal. Shirky takes genetics as an example:
It would be relatively easy, for example, to encode a description of genes in XML, but it would be impossible to get a universal standard for such a description, because biologists are still arguing about what a gene actually is. There are several competing standards for describing genetic information, and the semantic divergence is an artifact of a real conversation among biologists. You can't get a standard til you have an agreement, and you can't force an agreement to exist where none actually does.Even something as apparently clear-cut as genetic science resists universal semantic presentation because the data is contaminated by its original context.
The opinions, prejudices, needs and worldview of a programmer are imprinted on their code to a far greater degree. That class you wrote the other day to process form values assumes that every field has exactly one value. The HTML the form was displayed in uses classes unique to your site's CSS. And the coding standards the class conforms to differ from standard PHP conventions because your organisation wants to achieve consistency with its .NET projects.
You might be able to shoehorn this code into the next project you complete for the same organisation, but there is little chance of your form-processing class ever being used by someone else entirely. The cleaner and more decoupled your code is, the more use it might be to someone else, but you cannot entirely erase the imprint of its original context because context is what gives your code meaning.
The way you can best foster reuse is to engineer a situation where the worldview embedded in your code is adopted by the reuser. Take Firefox as an example. The core functionality of the browser is leveraged by thousands of plugin developers. But the API these extensions work with was laid down by the developers of Firefox and has meaning only in the context of the Firefox browser.
A cross-browser extension API would be very convenient, but the task of creating a plugin model that would apply as well to Chrome as to Firefox would be gargantuan. Witness how difficult it is to even get HTML and CSS to render the same in more than one browser. A cross-browser API would take the compatibility issues from the DOM and spread them to every aspect of the browsing experience.
Commonly-used frameworks also owe their success to prescribing a worldview. The only painless way to work with a framework is to follow the Rails way, the Django way or the Drupal way. To reuse someone else's code you must make concessions to their way of doing things.
There are a couple of current developments in software engineering that will help with the code reuse problem. Test driven development helps to make the assumptions embedded in code explicit by describing them using unit tests. The referential transparency fostered by the functional programming paradigm controls context by quarantining side-effects.
But code reuse will always be intrinsically hard because context is sticky.
Sunday, 11 October 2009
as design quality increases the designer disappears. He went on to suggest
that the formalism we when recognising a designer's work is as much an imperfection of the design as a feature.
I was not so sure. There are definitely instances where the designer's mark seems to contribute to the design. Programming languages are a good example. Ruby would not be what it is without the strength of Matz's personal vision.
On the other hand, I do get annoyed when a designer's vanity tempts them to graffiti their signature onto a design that would have been better left alone. I'm thinking here of 'clever' designs like teapots with two spouts.
The difference between these two scenarios is, in my opinion, is whether or not the design space is convergent. I mean the term in the same sense as convergent evolution. In a convergent design space, the differences between designs will gradually disappear over time as individual designers are gradually more successful at approximating the best solution to the problem at hand.
In such a domain, it follows that any deviation from the one true design is noise. The designer's personal touch therefore detracts from their attempt to produce good design. A double-spouted teapot might help the designer express their individuality, but the result is just slightly less convenient tea.
However, it's rare to find a design space where a Platonic 'best' design exists. When have the various stakeholders in the construction of a new building ever agreed what is best? And to revisit my earlier example, which language is 'best' is one of the most common topics of programming flame wars.
Designers usually have to balance competing interests. How much should the finished product cost? What kind of user/customer should it be optimised for? What about older users/customers, or ones with disabilities? And not least, when is the deadline for the completed design? How designers balance these interests will inevitably affect the design. There is rarely any objective way to balance these subjective interests, so there is rarely an objective best design.
In such open design spaces, the designer's vision serves an important purpose - coherence. There are so many elements in a complicated design that it can be hard to take them in all at once. A strong authorial vision helps users/customers by giving them a guide to predict and/or remember the designer's choices.
Many Ruby admirers speak of the
Principle of Least Surprise. Ruby is comparatively easy to learn and understand because its design choices aim to produce the least astonishment in the programmer. But since every programmer comes from a different background, they will each have different expectations and standards of astonishment.
So more precisely, Ruby was designed according to the Principle of Matz's Least Surprise. Once the programmer gets a handle on Matz's programming aesthetic, they can make educated guesses about parts of the language that they have not yet encountered.
So in conclusion,
the formalism we when recognising a designer's workis a feature because it makes understanding complicated design simpler.
Friday, 4 September 2009
Trouble is, developers frequently need to install programs. The nice way to handle this is to use something like sudo (for *nix systems). A specific command can be executed with raised permissions, but for the rest of the time the user operates with normal privileges.
However, some operating systems (like Windows XP), do not fully support the sudo approach. There is a command known as "runas", but this does not work in all circumstances. In particular, it is not available for .msi installer files.
If you are running Windows XP on a non-administrator account, you need to install an .msi and you have the password of an administrator account, you do not have to take the trouble to logout and log back in. The following workaround lets you use your web browser as an .msi launcher and bypass the restriction:
- Use runas to launch your browser with admin privileges
- Open the .msi in your browser, either from the web or your local filesystem
- What you do next depends on what browser you use. In Firefox, you double click on the .msi in the download window which will launch it - as admin!
Needless to say, use this trick sparingly. Running your browser as administrator all the time is almost as bad as developing under an admin account.
Sunday, 23 August 2009
Benjamin Carlyle has a suggestion for how we might bypass the bottleneck of registering MIME types with the IANA. Stefan Tilkov has been following similar proposals for some time.
There are two separate problems in this debate that I think are being confused.
One is the identification of new resource types. The other is providing a definition of the format of resources e.g. an XML schema.
Benjamin's proposal involves using a URI to identify resource types. If the resource definition URIs of two resources differ, then the client must interpret them as different resource types. Dereferencing a resource definition URI will yield a definition of that format.
The difficulty as I see it is that multiple URIs could point to the same resource definition. Furthermore, the data format might be defined in multiple places and in multiple ways e.g. a DTD and an XML schema. Ideally conneg would be used to put equivalent definitions behind a single URI, but in practice conneg is oft-neglected.
The process of agreeing on a canonical URI to use for a given format is no simpler than agreeing on an "x-" prefixed custom MIME type. So while Benjamin's proposal helps solve the resource type definition problem, I don't think it makes much progress on the resource type identification problem.
Benjamin quite rightly points out that a key problem in resource identification is when resource types 'grow up' and move beyond the boundries of the organisation where they were created. However I fear that using URIs to identify formats makes this transition more difficult, because if the URI of the format description is changed then all clients using the format will have to be updated.
If the original URI of the format was internal to the organisation's LAN, then the URI will be forced to change. At some stage during the format's development the URI will probably have to be changed anyway.
In short, URIs are good for pointing to definitions of resource formats but are problematic for establishing the identity of resource types.
Thursday, 16 July 2009
Even Kurt Gödel, who's incompleteness theorems are perhaps the most well-known examples of the limitations of mathematics, is widely regarded as a Platonist. He, like many mathematicians, regarded mathematics as more real than the physical world. For a Platonist, theorems are timeless and eternal. Mathematicians' role is to discover and document them as purely as possible. Paul Erdős expressed this sentiment by imagining that the most beautiful proofs came from a book written by God.
On the other hand, few would claim that Linux existed before Linus Torvalds started writing it in 1991. Even a software engineering concept like structured programming is usually described as being founded by Edsger Dijkstra, even though the mathematical theorem that underpins the movement could be said to have been discovered (by Corrado Böhm and Giuseppe Jacopini).
Some mathematicians do leave room for authorship in their understanding of their profession. Leopold Kronecker once said that Bertrand Russell went further and said that integers were also created by man - or at least they could be constructed using mathematical logic.
The defining characteristic of authorship (as opposed to invention) is that the subjectivity of the author is imprinted on the work. One example of this in mathematics is the calculus. Isaac Newton and Gottfried Leibniz both discovered the calculus, but they approached it in different ways. I would argue that their divergent expressions of the same idea are best understood through the lens of authorship, especially given the importance Leibniz placed on notation and presenting his thoughts for human understanding.
But by and large, mathematicians are better described by Roland Barthes' account of tellers of tales before modern authorship was invented:
In ethnographic societies the responsibility for a narrative is never assumed by a person but by a mediator, shaman or relator whose ‘performance’ — the mastery of the narrative code —may possibly be admired but never his ‘genius’. The author is a modern figure.Mathematicians attach their names to their work, but more in the spirit of explorers naming newly discovered peaks than authors cultivating writing credits. It is this emphasis on discovery rather than creation that most clearly differentiates mathematical practice from programming and which means that a purely mathematical education is not sufficient to understand software development.
Wednesday, 1 July 2009
It's not immediately obvious what political stereotype to apply to software developers. One the one hand, computer systems are tightly controlled, deterministic universes where users can only venture if they provide the correct password (which the programmer has decreed shall contain no less than three non-alphanumeric characters).
This suggests that programmers might have sympathy for centrally planned economies. Citizens' input will correctly validate or they will be re-educated!
On the other hand, a lot of our time as software designers is dedicated to preserving flexibility. We use factory methods and interfaces to give ourselves the freedom to change which class we wish to instantiate. We attempt to compose methods so that they can be re-used in other contexts. Nathaniel Borenstein captured this attitude perfectly:
It should be noted that no ethically-trained software engineer would ever consent to write a DestroyBaghdad procedure. Basic professional ethics would instead require him to write a DestroyCity procedure, to which Baghdad could be given as a parameter.Some use C++ because they don't want some garbage-collecting nanny state managing their memory for them. I'm sure that Margaret Thatcher would have agreed with this sentiment had she studied programming rather than chemistry - though her declaration that
if you want to cut your own throat, don't come to me for a bandagedoes suspiciously like she's warning a junior developer away from pointer arithmetic.
The common theme of these examples is that software developers are constantly struggling to capture logic at the appropriate place in their code. This is the key point I take from software to my own political opinions. When it comes right down to it, most of the political ideas and philosophies that I dislike are operating on the wrong level of abstraction.
Centralised steel quotas are a bad idea because production decisions are best made on a local level, not in Beijing. Internet censorship is problematic because an individual is best placed to decide what they do not wish to view. A single point of control is not appropriate for these examples.
However, some political decisions cannot be left to the individual. Controlling greenhouse gas emissions is a good example of an issue that needs to be managed centrally to avoid a tragedy of the commons.
My instincts have always been libertarian. But my experiences as a software developer have taught me that there is no hard and fast rule for what level of abstraction decisions should be made at. As much as I'd like individuals to be given complete control over their lives, sometimes individuals simply do not have the necessary perspective to make the best decision.
For example, ethical consumerism is a laudable philosophy, but it does not work unless there is some central agency capable of understanding the consequences of individual purchases who can guide consumers. And government schools are necessary because leaving it to parents to purchase education for their own children will lead to unacceptable inequality.
Considering issues in isolation can also lead to short-sighted decisions. I disapprove of the Californian system of referenda, because of course people will vote for lower taxes and higher spending if they are asked about these issues in isolation. Budgets need to be created from a perspective that allows consideration of all of a government's finances.
To paraphrase Einstein's well-known quote, make your code as simple (and generic) as possible, but no simpler. Give individuals as much liberty as possible, but no more. And when contemplating a political dilemma, consider what level of abstraction the problem would be best addressed at.
Sunday, 21 June 2009
I suppose it's to its creators' credit that we are still using a tool that was standardised in close to its present form in 1973. But email has so many shortcomings in its very architecture that it's high time we upgraded to something else. There are many partial solutions to the problems listed below, but to properly fix them all requires a ground-up rebuild.
No guarantee of deliveryIf an SMTP server swallows your email, tough luck. An email is like a postcard hurled out into the void. If it disappears somehow then no one will ever know.
No support for high-level abstractions like conversationsPeople do not send emails in isolation. Often, an email will be part of a series of replies, perhaps involving multiple recipiants.
Email gives you no good way of grouping individual messages into a conversation, other than by dumping the entire previous contents of the conversation at the bottom of each message. Gmail does a valiant job of threading emails, but the process it's using doesn't help you if you're not using Gmail, is unreliable and is inherently just a hack.
The lack of any coherent high-level organising principle makes email communication chaotic when the number of messages involved is large. Sometimes this is so unmanagable that it causes individuals to take the drastic step of declaring email bankruptcy, notably including Donald Knuth (founder of literate programming) and Lawrence Lessig (of the Creative Commons and the EFF).
No canonical and independent copyAn email exists in its sender's outbox and its receiver's inbox. It may also be stored by an email server somewhere. If these copies are deleted or lost then it's gone.
If someone tampers with an email that you sent them, you may have no way of proving this to a third party. If someone tampers with your email en route then you have no way of proving this even to the receiver.
There's also no good way to introduce someone into an email conversation they have not been following (you can forward an email containing a bunch of replies, but that's hardly usable). Emails don't have a URL that you can pass around or use as a reference if, for example, the email contains an important decision that needs documenting.
No native encryptionIt is possible to encrypt emails. But if you do, then both sender and receiver need to be using email clients that support encryption. The sender would also have to have access to the receiver's public key.
No way of verifying the sender's identityThe only way you know who sent an email is by looking at the 'from' field. If that field is filled out wrongly then there is no way to tell. Impersonating someone over email is technically trivial (unless you use digital signatures, which have the same disadvantages as encryption).
The futureI have high hopes that Google Wave will solve some or all of these problems. But there are there two big advantages email has over Google Wave:
- It's proven
- It's widely supported and understood
For Google to get wide adoption of Wave they're going to have to come up with a solution that allows incremental adoption. Perhaps the Google Wave client could support email as well as Waves so that I can communicate with the vast majority of my contacts who aren't bleeding-edge adoptors.
But until then we're going to have to suffer the absurdity of disagreements and uncertainty about whether a particular email was sent, who sent it and what was in it - like this Australian political scandal.
Update: The scandalous email has turned out to be a fake.
Saturday, 13 June 2009
Initially, Raskolnikov justifies his crime by imagining himself to be one of the elite few who transcend ordinary morality. Like Napoleon Bonaparte, these extraordinary men are destined to seize society bend it to their will. Their higher purpose excuses them from the constraints of morality that ordinary members of society must abide by.
The reader soon realises that Raskolnikov is not a member of this elite cadre. True Napoleons are too busy invading Spain to construct self-serving psuedo-philosophical justifications. As the novel progresses, Raskolnikov's crippling doubts reveal to him the fallacy of his delusions of grandeur. He realises that men who are preordained to shake civilisation to its very foundations do not agonise over their calling.
In the world of software, it is not at all uncommon to encounter a developer who is convinced that they are a Napoleon. Perhaps it's ignorance. Perhaps it's arrogance. Whatever the reason, they are motivated to create their own inadequate solutions to problems that have already been well and truly solved. Often they take it on themselves to improve upon things that ordinary programmers take as given (like the nature of truth itself).
Google Wave may just be an example of a revolution that we actually need. Email is a tried-and-true technology, but it has its limits and could benefit from a ground-up redesign. The success of Google Maps certainly suggests that the Rasmussen brothers are candidates for web Napoleons.
On the other hand, Google's non-standard implementation of OpenID looks more like it was designed by Rodion Romanovich Raskolnikov. The whole point of OpenID is that it is a universal protocol, yet they have extended it for their own specific needs (they want to be able to use gmail addresses rather than URLs). What's worse, every developer who wishes to accomodate Google OpenIDs on their site will have to contaminate their code with a special case to handle gmail addresses.
If you are contemplating producing your own version of a well-established technology, it is just possible that you possess a unique insight and that by reinventing the wheel you will drag software in a bright new direction. But if you are not sure, then your code is more likely to resemble an opportunistic act of violence than the Code Napoléon.
And even if you are certain that your way is better, you're probably still wrong.
Tuesday, 9 June 2009
As far as web accessibility and interaction-heavy sites are concerned, he is asking the wrong question. What he should be wondering is,
When is the right time for the fancy stuff?
Derek is of course aware of the importance of accessibility. However his approach seems to be to build the stairs first and then put in the wheelchair ramp later, and no matter how well you plan you will always come across implementation problems you had not considered.
My main criticism of Derek's article is that he constructs a dichotomy between business imperatives (getting the site up) and doing the right thing (implementing accessibility). However there are tangible benefits for your project in getting accessibility right early on.
Most importantly, there is one blind and deaf user that every web developer should be concerned about: Googlebot. If a site is not accessible for a human with a disability then it almost certainly will not be indexed properly by Google. You cannot improve your site through user feedback if you have no users because no one can find your site.
Of course, a big motivation web developers to make their sites accessible is that it's the right thing to do. And it is. But if you follow progressive enhancement and make accessibility part of your development process then you'll get more out of it than just a warm fuzzy feeling.
Sunday, 7 June 2009
Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.
The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.
- Donald Knuth, "Literate Programming" (1984)
I have great affinity for this way of viewing software development. Software design has more in common with the composition of an essay than any strictly scientific activity. I think it's an accident of history that programming is placed within engineering faculties rather than being understood as an outgrowth of philosophy and formal logic.
Literate programming acknowledges software development's place among the humanities. By extension, it acknowledges the relevence of non-scientific ideas to the process of cutting code. Our craft requires the creative and disciplined presentation of thought, so we would be foolhardy to ignore thousands of years of the history of ideas. Programming does not exist inside a vaccuum. Neither should the programmer.
I am not trying to argue that programmers do not need a firm grasp of science. But good programmers cannot rely solely on scientific concepts if they wish their code to be comprehensible to their peers (or future selves).
In the spirit of literate programming I will use this blog to explore software development and its interplay with literature, philosophy, politics and mathematics.