boodebr.org
MYOML: Smoothing things out with PHP
At this point, the markup language is complete. However, there are a couple of things that I want to add to make it even easier to write articles. The first part of this is a little PHP wrapper to serve the articles. The PHP wrapper performs the following functions:
  1. Server-side XSLT processing
  2. Makes <code> tags even easier
  3. Convert tabs to spaces
Server-side XSLT processing
Not every browser has a built-in XSLT processor. Therefore, if you were to serve your articles as pure XML, there is a certain percentage of visitors who would not be able to read them. Over time, the percentage of non-XSLT capable browsers should shrink and approach zero, so this aspect of the problem should eventually vanish. However, there is a second reason to do the XSLT processing on the server: Not all client XSLT processors will produce the exact same output. Performing the processing on the server ensures that the viewer is seeing exactly what you intended for them to see.

The PHP script performs the XML->HTML processing on the server, and caches the result so it only has to run again if the XML source file or the XSLT template changes.
Easier <code> tags
My goal for <code> tags is that you should be able to cut and paste arbitrary code directly into the XML file, with no editing needed. One problem with this, for example, is that the characters < and > are special in an XML file. If you tried to copy this code sample directly into the XML file, you would have problems:
Code sample with embedded < and > characters
for i in range(10):
    if i > 5:
        print "I > 5", i
    elif i < 4:
        print "I < 4", i
An XML parser would think that the embedded < and > characters were the beginning or end of tags and would give you an error since they don't form complete tags. Since an XSLT processor is based on XML, the same problem arises there.

The solution to this is to embed your code inside of a CDATA tag. A CDATA section tells the XML parser to treat the enclosed characters as pure data and not try to parse it. To wrap your code in a CDATA tag, you place the special sequence "<![CDATA[" at the start of your code, and the sequence "]]>" at the end of the code.

Although you could do this by hand, it quickly becomes very tedious. To ease the pain, I added some PHP code to automatically insert the CDATA markers for both the <code> and <c> tags.

One drawback to this is that if you want to insert the literal characters <code> into your document (as I just did), you need some sort of escaping mechanism so that the CDATA wrappers are not inserted. To support this, I have the PHP code ignore the text "<!--NULL-->". Now, when I want a literal <code> tag to show up in the output stream, I write it like this: "<c><<!--NULL-->code></c>". The PHP code will ignore the inner <code> tag when inserting CDATA tags, and will delete the "<!--NULL-->" string from the final output stream.

Although it's a pain to do all that escaping, I doubt I'll have much need to place literal <code> tags in an article, outside of articles about the markup language itself. I see it as a temporary problem.
Replacing tabs with spaces
I have my text editor (jEdit) set to display TAB characters as 4 spaces. However, the HTML browsers I have tried seem to use a 8 spaces per TAB. This causes the code samples to be much wider than intended, more quickly filling the bounding box and being clipped off.

To get around this problem, my PHP script replaces TAB characters with four spaces. This doesn't affect normal HTML where whitespace is not significant, but makes the code samples appear as intended.
The code
Here is the PHP script, if you'd like to use it: format.php

(I should caution you that I've written very few PHP scripts, so there may be some non-optimal code in there. At the very least, it is hopefully not buggy.)

To use it, I create a small wrapper script for the article. For example, if the article is called "article.xml", then I'd create a file called "article.php" containing the following code:
article.php
<?php
/* adjust path to wherever you saved the format.php file ... */
include "../format.php";
serve_xml_to_html('article.xml');
?>
Of course, you could write a generic script that lets you pass the name of the XML file, but I felt that was less secure and opens you up to the possibility of people viewing arbitrary files on your server. Having a minimal wrapper for each article is a simple way to be secure.

Finally, here it is in action, serving the sample "article.xml": article.php