Page 1 of 1

When to Use Character Codes

Posted: Fri Jul 22, 2011 9:50 pm
by jbzech
I'm pretty new to XML and coding generally, but picking it up quickly.

One question I can't seem to figure out is when I should use character codes for characters (such as &#8212 for em-dash) instead of using the character itself in the XML.

For example, if I paste in content from Word to a DocBook page in Author View, my em-dashes and curly quote marks all look fine in both Author & Text view. Should I replace them with the appropriate codes in text view? Or leave them?

I'm working in DocBook documents with encoding="UTF-8" and I'm outputting the XML content as PDF, HTML, and ePub, and I think the output looks the same whether I used character codes or the original characters pasted in from Word. I'm publishing work for consumers, so it is important to have appropriate curly quotes, etc.

Can someone point me to a layman resource or give me a quick rundown on this issue? Right now I'm searching out single and double quotes, em-dash, en-dash, and ellipsis. Are there other characters I should be replacing with code? Am I wasting my time?

Any help is appreciated.

Re: When to Use Character Codes

Posted: Mon Jul 25, 2011 11:35 pm
by adrian
Hello,

Character coding(numerical character entity) provides the means to include in the XML content characters that are not supported by the XML document encoding.
e.g. Use japanese characters in an XML document with a Latin(ISO8859-1) encoding

Since you are using UTF-8, which is a very comprehensive encoding, they will rarely be necessary.

Note however that there are some key characters('<', '&', quotes in attribute values) that are forbidden in XML content and must be replaced with a numerical character or character entity reference for the XML document to be well formed. If you are working in Author mode these are automatically replaced by Oxygen. However, if you are editing in Text mode, you have full control so you must also take these under consideration.

http://en.wikipedia.org/wiki/List_of_XM ... references

Regards,
Adrian