BOM Handling Question

Here should go questions about transforming XML with XSLT and FOP.
Jamil
Posts: 85
Joined: Thu Oct 23, 2008 6:29 am

BOM Handling Question

Post by Jamil » Tue Jun 23, 2020 6:28 pm

I am noticing that when I execute an XSLT with encoding set for UTF-8, the BOM is not being created in the output file. I just checked Ecoding under preferences, and UTF-8 BOM handling is set for keep. For a new file, I want the BOM to always be added to the output file. My transformer is set to Saxon-EE 9.9.1.7.

How can I force the BOM to be written for new output files?

This is <oXygen/> XML Editor 22.1, build 2020061102

Thanks.

Radu
Posts: 7028
Joined: Fri Jul 09, 2004 5:18 pm

Re: BOM Handling Question

Post by Radu » Mon Jun 29, 2020 9:09 am

Hi Jamil,

That UTF-8 BOM handling setting in Oxygen is used only when opening and saving XML documents in the application. It does not control the way in which the Saxon XSLT processor saves the result of applying an XSLT transformation. I added an internal issue to see if we can use our setting to control the BOM used for serializing the result of the XSLT processing.
In general, adding BOM bytes to UTF-8 files is useless, it is also not recommended:

https://stackoverflow.com/questions/222 ... ithout-bom

Usually the BOM makes sense when saving to UTF-16 but it can also be missing from UTF-16 files, in which case a default behavior is implied.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

Jamil
Posts: 85
Joined: Thu Oct 23, 2008 6:29 am

Re: BOM Handling Question

Post by Jamil » Mon Jun 29, 2020 9:54 pm

Hi Radu.

The issue I face is that the Unicode file is not interpreted correctly resulting in character loss for UTF-8. My XSLT output is text containing UTF-8. There is no indicator for this other than the BOM. Since it is missing, data gets interpreted incorrectly resulting in data loss.

For UTF-8, it may be considered useless if, and only if, all characters are eight bit. In the event of 16 bit characters under Windows, it should be required.

Radu
Posts: 7028
Joined: Fri Jul 09, 2004 5:18 pm

Re: BOM Handling Question

Post by Radu » Tue Jun 30, 2020 6:10 am

Hi,

If you are outputting XML, the XML default encoding according to the specification is UTF-8.
If you are outputting some other text content, I found an attribute on the xsl:output element which you could try to set like:

Code: Select all

<xsl:output method="text" byte-order-mark="yes"/>
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

Jamil
Posts: 85
Joined: Thu Oct 23, 2008 6:29 am

Re: BOM Handling Question

Post by Jamil » Wed Jul 01, 2020 1:18 am

Thanks, and I was not even aware this attribute existed. This solved the issue.

Post Reply