Page 1 of 1
UTF-16LE and UTF-16BE
Posted: Thu Oct 21, 2004 4:32 am
by MURATA Makoto
These two charsets are defined in
RFC 2781. The use of the BOM for
these charsets is INCORRECT. However, oXygen 5.0 outputs the BOM
for these two charsets.
Posted: Thu Oct 21, 2004 6:19 pm
by george
Hi Makoto,
It seems to be a Java problem when using UnicodeLittle and UnicodeBig java encoding names as we just create a writter passing the right java encoding to it and it writes automatically the BOM. We will add a filter to remove the BOM characters.
The problem can be reproduced with something like:
public void testUTF16LEandBOM() {
try {
OutputStream os = new FileOutputStream("test/unicode/tmpUTF16LE.txt");
Writer wr = new OutputStreamWriter(os, "UnicodeLittle");
wr.write("test");
wr.close();
assertEquals(8, new File("test/unicode/tmpUTF16LE.txt").length());
} catch (Exception e) {
logger.error(e, e);
fail(e.getMessage());
}
}
Best Regards,
George