found drama

get oblique

xsl[ran]t.

by Rob Friesel

Making a big push to take my work-related XML projects to the next level. Why? Let’s just say that management is boring and tedious, lacking in the mental rewards that the problem-solving of programming-type ish offers. That being said, renewing my efforts on the ‘procdoc’ and ‘-faq’ projects for work seem a happy compromise.

So why <rant />?

Because I’m brickwalling again.

The biggest part of the big push is to ditch as much/many of these <[CDATA[*]]> sections as possible and standardize what goes into the file. Theoretically, by managing the input via DTD-declared elements/entities, it’s good long-term b/c it’ll keep “weird stuff” out. So whereas before we might have had something like:

<axn><[CDATA[Here is some text with <a href=”/”>html ish</a> mixed in practically <i>willy-nilly</i>]]></axn>

…my goal is to replace the input/XML file with something more structured, such as:

<axn>Here is some text with <link href=”/”>html ish</link> mixed in but in a way that <impt>makes sense</impt></axn>

…while still generating the same, beautified XHTML output.

This is the part where the audience asks: But if it ain’t broke, why try to fix it??? And I refer them back to the above statement re: standardizing it for any number of reasons. Call me a data-Nazi, but regularity and predictability in your data is a good rather than bad thing last I checked. And the implicity problem with the CDATA sections mixed into otherwise well-formed elements is that you open the door to all kinds of non-standard, potentially malformed code etc. etc. etc. And that’s just a starting point. As is, you can’t query the XML (for example) for any/all links mentioned in a particular document. You could grep them maybe but let’s stick to the subject.

Anyway, tonight’s frustration was a series of “almost there” statements in as much as the main XSL stylesheet for the ‘procdoc’ project goes. As alluded to above, I’ve been trying to extend the capabilities of this DTD in a few ways and figured that some sort of <link /> element was the best place to start. The way I see it, this is the type of element that’d get mixed in all over the place with what is otherwise text/cdata sections. So I made a quick change to the DTD to have this element included as a potential child node of the note element and I was off and running…

Only the changes required in the XSL seem to be much more tricky. I found this one article that seemed to suggest a “less is more” type of “keep it simple stupid” approach to the problem. Which I happily tried. But this seemed to be a failure in as much as I kept getting elements that duplicated in the output or else wouldn’t output at all. And in all kinds of weird combinations as well. And so we (“we” = “I”) went ahead and tried a scenario-appropriate variation on the “clunky” approach described by the article’s author — only to wind up with an EVEN WORSE output file than before.

/le sigh

There were some interesting tricks learned and let’s be honest, it’s always fun to experiment with some new techniques but it is teh suck to conclude the night’s session by reverting back to the file with which we started. The take-home message seemed to be that while the “clunky” approach gave us the worst output of all, there’s a happy medium waiting to be discovered for the particular sub-set of problems with which we are still faced. The limited “happy” results I got were encouraging and all seemed to shed a little light on the right path. Now if I can just get the damn sodium lamp up and running…

About Rob Friesel

Software engineer by day, science fiction writer by night. Author of The PhantomJS Cookbook and a short story in Please Do Not Remove. View all posts by Rob Friesel →

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*