Escaping characters

Post here questions and problems related to editing and publishing DITA content.
nam
Posts: 18
Joined: Fri Apr 21, 2006 12:41 am

Escaping characters

Post by nam »

I have an application that converts Word documents to DITA, and in the process it scans each paragraph for illegal characters. The original documents include left and right quotes, as well as regular double quotes, angle brackets. etc.

My process converts the standard five:
  • " "
    ' '
    < <
    > >
    & &
But for other characters I convert them to their ASCII code, wrapped in the "&#" and ";" characters, so left double quote becomes "&#147;" and right double quote is "&#148;", etc. but all I get in my output is "#".

Are only the previous five allowed, or did I misunderstand how to escape the other series of typable, but illegal characters?
Neil in Washington
adrian
Posts: 2850
Joined: Tue May 17, 2005 4:01 pm

Re: Escaping characters

Post by adrian »

Hi,

Note that you should consider the character encoding. The character codes (#147 and #148) seem to be for ISO8859-1 (or similar).
Since you're working with DITA which is usually using the UTF-8 encoding, you may want to use the corresponding Unicode character codes (U+201C, U+201D): &#x201c; and &#x201d;
It goes without saying that the same applies for other special characters.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
nam
Posts: 18
Joined: Fri Apr 21, 2006 12:41 am

Re: Escaping characters

Post by nam »

Thank you. I have updated my code to do the correct replacements. Finding the control codes was fun!
Neil in Washington
Post Reply