org.exist.util
Class UTF8

java.lang.Object
  extended byorg.exist.util.UTF8

public class UTF8
extends java.lang.Object

This class contains two static tools for doing UTF-8 encoding and decoding.

UTF-8 is ASCII- transparent. It supports character sets requiring more than the seven bit ASCII base range of UTF-8, including Unicode, ISO-8859, ISO-10646, etc..

We do not use an ISO UCS code signature, and we do not use a Java Data I/O- style strlen prefix.

Author:
John Pritchard (john@syntelos.org)

Constructor Summary
UTF8()
           
 
Method Summary
static XMLString decode(byte[] code)
          Decode UTF-8 input, terminates decoding at a null character, value 0x0.
static XMLString decode(byte[] code, int off, int many)
           
static XMLString decode(byte[] code, int off, int many, XMLString xs)
          Decode UTF-8 input, terminates decoding at a null character, value 0x0.
static byte[] encode(char[] str)
          Encode string in UTF-8.
static byte[] encode(char[] str, int start, int length, byte[] bytbuf, int offset)
          Encode string in UTF-8.
static byte[] encode(java.lang.String s)
          Encode string in UTF-8.
static byte[] encode(java.lang.String str, byte[] bytbuf, int offset)
           
static byte[] encode(java.lang.String str, int start, int length, byte[] bytbuf, int offset)
          Encode string in UTF-8.
static int encoded(char[] str, int start, int len)
          Returns the length of the string encoded in UTF-8.
static int encoded(java.lang.String str)
          Returns the length of the string encoded in UTF-8.
static int getUTF8Encoding(char in, char in2, byte[] out)
          Static method to generate the UTF-8 representation of a Unicode character.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UTF8

public UTF8()
Method Detail

decode

public static final XMLString decode(byte[] code)
Decode UTF-8 input, terminates decoding at a null character, value 0x0.

Throws:
java.lang.IllegalStateException - Bad format.

decode

public static final XMLString decode(byte[] code,
                                     int off,
                                     int many)

decode

public static final XMLString decode(byte[] code,
                                     int off,
                                     int many,
                                     XMLString xs)
Decode UTF-8 input, terminates decoding at a null character, value 0x0.

Throws:
java.lang.IllegalStateException - Bad format.

encode

public static final byte[] encode(char[] str)
Encode string in UTF-8.


encode

public static final byte[] encode(char[] str,
                                  int start,
                                  int length,
                                  byte[] bytbuf,
                                  int offset)
Encode string in UTF-8. Warning: the size of bytbuf is not checked. Use encoded() to determine the size needed.


encode

public static final byte[] encode(java.lang.String str,
                                  byte[] bytbuf,
                                  int offset)

encode

public static final byte[] encode(java.lang.String str,
                                  int start,
                                  int length,
                                  byte[] bytbuf,
                                  int offset)
Encode string in UTF-8. Warning: the size of bytbuf is not checked. Use encoded() to determine the size needed.


encode

public static final byte[] encode(java.lang.String s)
Encode string in UTF-8.


encoded

public static final int encoded(java.lang.String str)
Returns the length of the string encoded in UTF-8.


encoded

public static final int encoded(char[] str,
                                int start,
                                int len)
Returns the length of the string encoded in UTF-8.


getUTF8Encoding

public static int getUTF8Encoding(char in,
                                  char in2,
                                  byte[] out)
Static method to generate the UTF-8 representation of a Unicode character. This particular code is taken from saxon (see http://saxon.sf.net).

Parameters:
in - the Unicode character, or the high half of a surrogate pair
in2 - the low half of a surrogate pair (ignored unless the first argument is in the range for a surrogate pair)
out - an array of at least 4 bytes to hold the UTF-8 representation.
Returns:
the number of bytes in the UTF-8 representation


<oXygen/> XML Editor provides support for editing and debugging XQuery expressions against the eXist XML Database.