[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] Grouping of text input file lines


Subject: [xsl] Grouping of text input file lines
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Sun, 11 Aug 2013 16:46:28 +0200

I'll briefly describe the problem and outline two approaches to a
solution. I'd be pleased to receive a comment or two.

The task is to convert a plain text file to XML using XSLT 2.0. The
text file contains lines, all according to
  tag: value
and these lines are grouped at three levels: "database", "relation"
and "field", where each entity has some options and one or more
children of the lower level (except for field, of course).

Example, indentation according to nesting level:

node: abc    # a DB option
key: CMOS   # a DB option
rel: rlo_one
  com: a relation # a relation option
  alg: direct         # a relation option
  ele: fa int
    com: blurb       # element (field) options
    def: 0
    acc: px
    acc: py
  ele: fb chars
    com: bla bla
    def: "----"
    alg: permute
  num: 100          # a relation option
rel: rlo_two
  com: another relation    # a relation option
  com: more comment
  com: yet more comment
  ele: fx int
    com: blurb
    def: 0
    acc: px
  ele: fy int
    com: bla bla
    def: 42
  num: 50                   # a relation option

The expected XML structure is obvious, I think: a sequence of DB
options and relation elements; these contain relation options and
field elements, which contain field options. Field order must not be
changed. "com" entries should be joined while observing line breaks,
and "acc" entries too, but joined with a space.

The first basic idea I used throughout is to maintain another string
sequence in parallel to the one containing the text lines. That
sequence contains just the tags, so that index-of can be used to
compute "interesting" line numbers. This way, subsequences of lines
for all or individual relations and fields can be conveniently
extracted.

The second idea is to use grouping. The sequence of lines is converted
to a sequence of nodes <tag>value</tag> and a nested
group-starting-with separates relations and fields - almost. As you
can see, there's some leading lines defining DB options, and each
relation contains option lines before and after the element groups.
Most likely, cherry-picking lines and line groups prior to the
glorious for-each-group has to be done using the technique described
above.

Any better ideas?
Thanks


Current Thread
Keywords