UTF-16LE and UTF-16BE

Having trouble installing Oxygen? Got a bug to report? Post it all here.
MURATA Makoto

UTF-16LE and UTF-16BE

Post by MURATA Makoto »

These two charsets are defined in RFC 2781. The use of the BOM for
these charsets is INCORRECT. However, oXygen 5.0 outputs the BOM
for these two charsets.
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Post by george »

Hi Makoto,

It seems to be a Java problem when using UnicodeLittle and UnicodeBig java encoding names as we just create a writter passing the right java encoding to it and it writes automatically the BOM. We will add a filter to remove the BOM characters.

The problem can be reproduced with something like:

public void testUTF16LEandBOM() {
try {
OutputStream os = new FileOutputStream("test/unicode/tmpUTF16LE.txt");
Writer wr = new OutputStreamWriter(os, "UnicodeLittle");
wr.write("test");
wr.close();
assertEquals(8, new File("test/unicode/tmpUTF16LE.txt").length());
} catch (Exception e) {
logger.error(e, e);
fail(e.getMessage());
}
}

Best Regards,
George
Post Reply