[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: fo:bidi-override - is it necessary?


Subject: Re: fo:bidi-override - is it necessary?
From: Frank Wegmann <wegmann@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 7 Aug 1999 23:34:14 +0200

I have to agree with Steve.  Yet, I'd like to clarify some points 
if you consider "real processing" of bidi.

1. If you have a Unicode-capable application that is able to handle
   mixed L2R and R2L text (and this is what Chris mentioned as example)
   then your application will know the following:
   - Every Unicode character knows whether it (or the rendering of its
     visual representation, the glyph) will be written from left to
     right or vice versa.
   - The Unicode control marks  (LRM,RLM,LRE,RLE,PDF,LRO,RLO) will be
     paid attention to. There's simply no way around it, or otherwise
     bidi processing doesn't make much sense.

2. If you have Hebrew or Arabic encoded in your documents, how will
   you represent them internally? UCS-2/UCS-4 is not very likely, e.g.
   we use UTF-8 for processing mixed Yiddish/German/English texts
   (Yiddish is written wirh Hebrew letters, erm .. mostly). In any case,
   you will very probably not see the glyphs in your marked up document.
   So an additional markup is not only excessively verbose, but it is
   definitely superfluous (no one but tools will be able to read the
   pure markup), and will thus (as Steve pointed out) make processing
   harder.

Here a typical line from such a document:

¿<f>ק×?ָנצענ×?, ¿<f>ק×?ָנצענ×?, 

Only after rendering will humans be capable to read that stuff. 
(Peter Flynn might object to point 2, since he did some very valuable
work with TEI WSDs and he used entity references for each and every
Hebrew character -- something that would be too expensive for us).
So this will very probably lead us in a land of confusion. Unicode-
based bidi processing is still not a common thing to do, so don't
let us put new some obstacles in the way.

Frank


> >It's markup instead of what are essentially formatting codes.
> 
> But it's markup that is 100% equivalent to formatting codes.
> 
> >And you *could* generate the bidi-override characters in the
> >transformation into FOs, but it's clearer to use markup rather than
> >cryptic hexcodes or entities.
> 
> How is
> 
>  <p>This is some <fo:bidi-override direction="rtl">Arabic
>  </fo:bidi-override> text.</p>
> 
> any less cryptic than
> 
>  <p>This is some &rtl-begin;Arabic&rtl-end; text.</p>
> 
> My real concern is that you now have two independent "absolute"
> formatting methods, without any clearly-defined semantics regarding
> how they might interact. What happens when LRO or RLO are used in the
> text within a fo:bidi-override element? Which one "wins"? The RFC you
> reference even mentions the problem explicitly:
> 
> "authors and authoring software writers should be aware that conflicts
> can arise if the DIR attribute is used on inline elements (including
> BDO) concurrently with the use of the corresponding ISO 10646
> formatting characters."
> 
> It just seems to me that given the fact that Unicode already supplies
> the elements required to handle all possible cases, the addition of
> another layer in the form of a markup tag adds nothing but unnecessary
> complexity. It makes the job of bidirectional formatting harder (and
> consequently more error-prone), not easier.
> 
> -Steve
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords