Lets get one thing straight. I’m not a huge fan of XML. I think it can be horribly verbose, complex and for all its capabilities it can be used by those who don’t understand it to generate vast quantities of rubbish. That said it is one of the sharper tools in my toolbox and solves some problems that have been with us for a very long time.
Early on in my career I worked a lot with embedded systems and often had to communicate with other systems, sensors, modems etc. Standards in those days amounted to RS232 and ASCII and even those used to get corrupted and used differently every time. Connecting a new device to a data logger usually meant first figuring out which was the ground and signal wire and whether they were using 9 or 5 volts. Once connected to a terminal you would then have to discover whether the communications protocol was text or binary, and whether the messages were formatted according to some existing protocol or the designer had invented yet another way of encoding channels, numbers, dates, errors and so on.
Few programmers seemed bothered even to learn about ASCII message packets SOH, STX, ETX etc. But everyone wanted to save bits in transmission so we ended up with all sorts of encodings: 12 bit data packed into odd and even bytes was one example.
Anyway for each protocol I would have to write a decoder or a parser. A text reader, Something to validate message packets, something to split the data into variables, something to understand the control structures, something to recognise data objects etc. Forth my language at the time was pretty good at this job but it could get tedious.
As I moved from embedded systems to desktop applications and then client server applications the same problems kept on coming up: reading ini files, reading application data files, handling email messages and other early internet protocols and so on. During that era I came across several systems that would abstract out the basic problem of data serialisation. ASN1 being one SGML another. Both these systems looked pretty good but both were complex, difficult to learn and felt like they were for big big systems rather than everyday use. The real problem though is having a critical mass of users and developers – having everyone agree to use the same system and thus build systems and tools we all could use.
Tim Berners-Lee based HTML on SGML in 1989 and by 1996 XML became the simplified version of SGML that we all could use for creating general markup languages, document encodings, and message protocols.
Since then XML has given rise to well known document formats such as RSS, Atom, SOAP, DocBook, and XHTML. Businesses created problem domain specific formats such as the global distribution systems (GDS) for hotel bookings and Airline travel. CIQ for Customer Information Quality. The Open Geospatial Consortium (OGC) created formats for exchanging geospatial metadata, map features, observations and measurements and Sensor Information.
Where well defined public schemas don’t already exist it is easy to invent new ones. I’ve created XML schemas for documenting Historical extreme weather events, CNC Machine Quality control instrumentation, and for personal use crosswords and quiz questions.
For me the power of XML comes from the availability of tools, the capacity for re-use and the ability to mix well known structures with new extensions. So now when faced with the need to create a document that I can process easily I start looking for an XML representation.
For a Résumé
Having had a long career as a software engineer, much of that as a freelance developer, I have a long and complex Résumé / Curriculum Vitae (CV) that has evolved through many incarnations. The first was typed on paper, then moved to an ASCII text file before moving through various word processors – all incompatible with each other. Although the basic structure remained the same: job history, education, skills etc. the style and layout changed as the tools became more capable, font technology improved and my own design skills grew.
Updating the CV is usually an annual task independent of how the current job happens to be going but this year it was prompted by an internal review at my work. Looking at the old OpenOffice version and thinking about generating both HTML online and PDF versions I decided it was time to consider an XML version.
These two represent two extremes of XML applications. XRL is an Open Source project hosted on SourceForge, Simple and well targeted it has a single DTD, HR-XML is a major system for Human Resources data exchange with hundreds of elements and multiple embedded schema documents of which a Resume forms one small part. In part two of this article I’ll take a look at both and consider which best suits my purposes.